[jira] Commented: (PIG-940) Cross site HDFS access using the default.fs.name not possible in Pig

2009-09-01 Thread Mridul Muralidharan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749905#action_12749905
 ] 

Mridul Muralidharan commented on PIG-940:
-

Is this supported in hadoop ? As in, can you specify the input to be on a 
different hdfs and get a mapred job to work ? IIRC no, but I could be missing 
something.

If it is no, then not sure if pig can support it without an intermediate distcp 
...

 Cross site HDFS access using the default.fs.name not possible in Pig
 

 Key: PIG-940
 URL: https://issues.apache.org/jira/browse/PIG-940
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
 Environment: Hadoop 20
Reporter: Viraj Bhat
 Fix For: 0.3.0


 I have a script which does the following.. access data from a remote HDFS 
 location (via a HDFS installed at:hdfs://remotemachine1.company.com/ ) [[as I 
 do not want to copy this huge amount of data between HDFS locations]].
 However I want my Pigscript  to write data to the HDFS running on 
 localmachine.company.com.
 Currently Pig does not support that behavior and complains that: 
 hdfs://localmachine.company.com/user/viraj/A1.txt does not exist
 {code}
 A = LOAD 'hdfs://remotemachine1.company.com/user/viraj/A1.txt' as (a, b); 
 B = LOAD 'hdfs://remotemachine1.company.com/user/viraj/B1.txt' as (c, d); 
 C = JOIN A by a, B by c; 
 store C into 'output' using PigStorage();  
 {code}
 ===
 2009-09-01 00:37:24,032 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop file system at: hdfs://localmachine.company.com:8020
 2009-09-01 00:37:24,277 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to map-reduce job tracker at: localmachine.company.com:50300
 2009-09-01 00:37:24,567 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
  - Rewrite: POPackage-POForEach to POJoinPackage
 2009-09-01 00:37:24,573 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size before optimization: 1
 2009-09-01 00:37:24,573 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size after optimization: 1
 2009-09-01 00:37:26,197 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
  - Setting up single store job
 2009-09-01 00:37:26,249 [Thread-9] WARN  org.apache.hadoop.mapred.JobClient - 
 Use GenericOptionsParser for parsing the arguments. Applications should 
 implement Tool for the same.
 2009-09-01 00:37:26,746 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 0% complete
 2009-09-01 00:37:26,746 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 100% complete
 2009-09-01 00:37:26,747 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 1 map reduce job(s) failed!
 2009-09-01 00:37:26,756 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Failed to produce result in: 
 hdfs:/localmachine.company.com/tmp/temp-1470407685/tmp-510854480
 2009-09-01 00:37:26,756 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Failed!
 2009-09-01 00:37:26,758 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2100: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist.
 Details at logfile: /home/viraj/pigscripts/pig_1251765443851.log
 ===
 The error file in Pig contains:
 ===
 ERROR 2998: Unhandled internal error. 
 org.apache.pig.backend.executionengine.ExecException: ERROR 2100: 
 hdfs://localmachine.company.com/user/viraj/A1.txt does not exist.
 at 
 org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:126)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:228)
 at 
 org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
 at 
 

[jira] Commented: (PIG-940) Cross site HDFS access using the default.fs.name not possible in Pig

2009-09-01 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12750040#action_12750040
 ] 

Koji Noguchi commented on PIG-940:
--

bq. Is this supported in hadoop ? 
Sure.

 Cross site HDFS access using the default.fs.name not possible in Pig
 

 Key: PIG-940
 URL: https://issues.apache.org/jira/browse/PIG-940
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
 Environment: Hadoop 20
Reporter: Viraj Bhat
 Fix For: 0.3.0


 I have a script which does the following.. access data from a remote HDFS 
 location (via a HDFS installed at:hdfs://remotemachine1.company.com/ ) [[as I 
 do not want to copy this huge amount of data between HDFS locations]].
 However I want my Pigscript  to write data to the HDFS running on 
 localmachine.company.com.
 Currently Pig does not support that behavior and complains that: 
 hdfs://localmachine.company.com/user/viraj/A1.txt does not exist
 {code}
 A = LOAD 'hdfs://remotemachine1.company.com/user/viraj/A1.txt' as (a, b); 
 B = LOAD 'hdfs://remotemachine1.company.com/user/viraj/B1.txt' as (c, d); 
 C = JOIN A by a, B by c; 
 store C into 'output' using PigStorage();  
 {code}
 ===
 2009-09-01 00:37:24,032 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop file system at: hdfs://localmachine.company.com:8020
 2009-09-01 00:37:24,277 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to map-reduce job tracker at: localmachine.company.com:50300
 2009-09-01 00:37:24,567 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
  - Rewrite: POPackage-POForEach to POJoinPackage
 2009-09-01 00:37:24,573 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size before optimization: 1
 2009-09-01 00:37:24,573 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size after optimization: 1
 2009-09-01 00:37:26,197 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
  - Setting up single store job
 2009-09-01 00:37:26,249 [Thread-9] WARN  org.apache.hadoop.mapred.JobClient - 
 Use GenericOptionsParser for parsing the arguments. Applications should 
 implement Tool for the same.
 2009-09-01 00:37:26,746 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 0% complete
 2009-09-01 00:37:26,746 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 100% complete
 2009-09-01 00:37:26,747 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 1 map reduce job(s) failed!
 2009-09-01 00:37:26,756 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Failed to produce result in: 
 hdfs:/localmachine.company.com/tmp/temp-1470407685/tmp-510854480
 2009-09-01 00:37:26,756 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Failed!
 2009-09-01 00:37:26,758 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2100: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist.
 Details at logfile: /home/viraj/pigscripts/pig_1251765443851.log
 ===
 The error file in Pig contains:
 ===
 ERROR 2998: Unhandled internal error. 
 org.apache.pig.backend.executionengine.ExecException: ERROR 2100: 
 hdfs://localmachine.company.com/user/viraj/A1.txt does not exist.
 at 
 org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:126)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:228)
 at 
 org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
 at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
 at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
 at 
 

[jira] Commented: (PIG-940) Cross site HDFS access using the default.fs.name not possible in Pig

2009-08-31 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749722#action_12749722
 ] 

Viraj Bhat commented on PIG-940:


One important point to add:
{code}
localmachine.company.com prompt hadoop fs -ls 
hdfs://remotemachine1.company.com/user/viraj//*.txt
-rw-r--r--   3 viraj users 13 2009-08-13 23:42 /user/viraj/A1.txt
-rw-r--r--   3 viraj users  8 2009-08-29 00:51 /user/viraj/B1.txt
{code}

 Cross site HDFS access using the default.fs.name not possible in Pig
 

 Key: PIG-940
 URL: https://issues.apache.org/jira/browse/PIG-940
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
 Environment: Hadoop 20
Reporter: Viraj Bhat
 Fix For: 0.3.0


 I have a script which does the following.. access data from a remote HDFS 
 location (via a HDFS installed at:hdfs://remotemachine1.company.com/ ) [[as I 
 do not want to copy this huge amount of data between HDFS locations]].
 However I want my Pigscript  to write data to the HDFS running on 
 localmachine.company.com.
 Currently Pig does not support that behavior and complains that: 
 hdfs://localmachine.company.com/user/viraj/A1.txt does not exist
 {code}
 A = LOAD 'hdfs://remotemachine1.company.com/user/viraj/A1.txt' as (a, b); 
 B = LOAD 'hdfs://remotemachine1.company.com/user/viraj/B1.txt' as (c, d); 
 C = JOIN A by a, B by c; 
 store C into 'output' using PigStorage();  
 {code}
 ===
 2009-09-01 00:37:24,032 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop file system at: hdfs://localmachine.company.com:8020
 2009-09-01 00:37:24,277 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to map-reduce job tracker at: localmachine.company.com:50300
 2009-09-01 00:37:24,567 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
  - Rewrite: POPackage-POForEach to POJoinPackage
 2009-09-01 00:37:24,573 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size before optimization: 1
 2009-09-01 00:37:24,573 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size after optimization: 1
 2009-09-01 00:37:26,197 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
  - Setting up single store job
 2009-09-01 00:37:26,249 [Thread-9] WARN  org.apache.hadoop.mapred.JobClient - 
 Use GenericOptionsParser for parsing the arguments. Applications should 
 implement Tool for the same.
 2009-09-01 00:37:26,746 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 0% complete
 2009-09-01 00:37:26,746 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 100% complete
 2009-09-01 00:37:26,747 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 1 map reduce job(s) failed!
 2009-09-01 00:37:26,756 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Failed to produce result in: 
 hdfs:/localmachine.company.com/tmp/temp-1470407685/tmp-510854480
 2009-09-01 00:37:26,756 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Failed!
 2009-09-01 00:37:26,758 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2100: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist.
 Details at logfile: /home/viraj/pigscripts/pig_1251765443851.log
 ===
 The error file in Pig contains:
 ===
 ERROR 2998: Unhandled internal error. 
 org.apache.pig.backend.executionengine.ExecException: ERROR 2100: 
 hdfs://localmachine.company.com/user/viraj/A1.txt does not exist.
 at 
 org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:126)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:228)
 at 
 org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
 at