[jira] Commented: (PIG-940) Cross site HDFS access using the default.fs.name not possible in Pig
[ https://issues.apache.org/jira/browse/PIG-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749905#action_12749905 ] Mridul Muralidharan commented on PIG-940: - Is this supported in hadoop ? As in, can you specify the input to be on a different hdfs and get a mapred job to work ? IIRC no, but I could be missing something. If it is no, then not sure if pig can support it without an intermediate distcp ... Cross site HDFS access using the default.fs.name not possible in Pig Key: PIG-940 URL: https://issues.apache.org/jira/browse/PIG-940 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Environment: Hadoop 20 Reporter: Viraj Bhat Fix For: 0.3.0 I have a script which does the following.. access data from a remote HDFS location (via a HDFS installed at:hdfs://remotemachine1.company.com/ ) [[as I do not want to copy this huge amount of data between HDFS locations]]. However I want my Pigscript to write data to the HDFS running on localmachine.company.com. Currently Pig does not support that behavior and complains that: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist {code} A = LOAD 'hdfs://remotemachine1.company.com/user/viraj/A1.txt' as (a, b); B = LOAD 'hdfs://remotemachine1.company.com/user/viraj/B1.txt' as (c, d); C = JOIN A by a, B by c; store C into 'output' using PigStorage(); {code} === 2009-09-01 00:37:24,032 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localmachine.company.com:8020 2009-09-01 00:37:24,277 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localmachine.company.com:50300 2009-09-01 00:37:24,567 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer - Rewrite: POPackage-POForEach to POJoinPackage 2009-09-01 00:37:24,573 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2009-09-01 00:37:24,573 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2009-09-01 00:37:26,197 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2009-09-01 00:37:26,249 [Thread-9] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2009-09-01 00:37:26,746 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2009-09-01 00:37:26,746 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2009-09-01 00:37:26,747 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed! 2009-09-01 00:37:26,756 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: hdfs:/localmachine.company.com/tmp/temp-1470407685/tmp-510854480 2009-09-01 00:37:26,756 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2009-09-01 00:37:26,758 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2100: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist. Details at logfile: /home/viraj/pigscripts/pig_1251765443851.log === The error file in Pig contains: === ERROR 2998: Unhandled internal error. org.apache.pig.backend.executionengine.ExecException: ERROR 2100: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist. at org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:126) at org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59) at org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:228) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) at
[jira] Commented: (PIG-940) Cross site HDFS access using the default.fs.name not possible in Pig
[ https://issues.apache.org/jira/browse/PIG-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12750040#action_12750040 ] Koji Noguchi commented on PIG-940: -- bq. Is this supported in hadoop ? Sure. Cross site HDFS access using the default.fs.name not possible in Pig Key: PIG-940 URL: https://issues.apache.org/jira/browse/PIG-940 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Environment: Hadoop 20 Reporter: Viraj Bhat Fix For: 0.3.0 I have a script which does the following.. access data from a remote HDFS location (via a HDFS installed at:hdfs://remotemachine1.company.com/ ) [[as I do not want to copy this huge amount of data between HDFS locations]]. However I want my Pigscript to write data to the HDFS running on localmachine.company.com. Currently Pig does not support that behavior and complains that: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist {code} A = LOAD 'hdfs://remotemachine1.company.com/user/viraj/A1.txt' as (a, b); B = LOAD 'hdfs://remotemachine1.company.com/user/viraj/B1.txt' as (c, d); C = JOIN A by a, B by c; store C into 'output' using PigStorage(); {code} === 2009-09-01 00:37:24,032 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localmachine.company.com:8020 2009-09-01 00:37:24,277 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localmachine.company.com:50300 2009-09-01 00:37:24,567 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer - Rewrite: POPackage-POForEach to POJoinPackage 2009-09-01 00:37:24,573 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2009-09-01 00:37:24,573 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2009-09-01 00:37:26,197 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2009-09-01 00:37:26,249 [Thread-9] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2009-09-01 00:37:26,746 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2009-09-01 00:37:26,746 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2009-09-01 00:37:26,747 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed! 2009-09-01 00:37:26,756 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: hdfs:/localmachine.company.com/tmp/temp-1470407685/tmp-510854480 2009-09-01 00:37:26,756 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2009-09-01 00:37:26,758 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2100: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist. Details at logfile: /home/viraj/pigscripts/pig_1251765443851.log === The error file in Pig contains: === ERROR 2998: Unhandled internal error. org.apache.pig.backend.executionengine.ExecException: ERROR 2100: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist. at org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:126) at org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59) at org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:228) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at
[jira] Commented: (PIG-940) Cross site HDFS access using the default.fs.name not possible in Pig
[ https://issues.apache.org/jira/browse/PIG-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749722#action_12749722 ] Viraj Bhat commented on PIG-940: One important point to add: {code} localmachine.company.com prompt hadoop fs -ls hdfs://remotemachine1.company.com/user/viraj//*.txt -rw-r--r-- 3 viraj users 13 2009-08-13 23:42 /user/viraj/A1.txt -rw-r--r-- 3 viraj users 8 2009-08-29 00:51 /user/viraj/B1.txt {code} Cross site HDFS access using the default.fs.name not possible in Pig Key: PIG-940 URL: https://issues.apache.org/jira/browse/PIG-940 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Environment: Hadoop 20 Reporter: Viraj Bhat Fix For: 0.3.0 I have a script which does the following.. access data from a remote HDFS location (via a HDFS installed at:hdfs://remotemachine1.company.com/ ) [[as I do not want to copy this huge amount of data between HDFS locations]]. However I want my Pigscript to write data to the HDFS running on localmachine.company.com. Currently Pig does not support that behavior and complains that: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist {code} A = LOAD 'hdfs://remotemachine1.company.com/user/viraj/A1.txt' as (a, b); B = LOAD 'hdfs://remotemachine1.company.com/user/viraj/B1.txt' as (c, d); C = JOIN A by a, B by c; store C into 'output' using PigStorage(); {code} === 2009-09-01 00:37:24,032 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localmachine.company.com:8020 2009-09-01 00:37:24,277 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localmachine.company.com:50300 2009-09-01 00:37:24,567 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer - Rewrite: POPackage-POForEach to POJoinPackage 2009-09-01 00:37:24,573 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2009-09-01 00:37:24,573 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2009-09-01 00:37:26,197 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2009-09-01 00:37:26,249 [Thread-9] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2009-09-01 00:37:26,746 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2009-09-01 00:37:26,746 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2009-09-01 00:37:26,747 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed! 2009-09-01 00:37:26,756 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: hdfs:/localmachine.company.com/tmp/temp-1470407685/tmp-510854480 2009-09-01 00:37:26,756 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2009-09-01 00:37:26,758 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2100: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist. Details at logfile: /home/viraj/pigscripts/pig_1251765443851.log === The error file in Pig contains: === ERROR 2998: Unhandled internal error. org.apache.pig.backend.executionengine.ExecException: ERROR 2100: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist. at org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:126) at org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59) at org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:228) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) at