Hi, Thanks for your reply. I do not know about cascading. Should I google it as "cascading in hadoop"? Also, what I was thinking is to implement a file system which overrides the functions provided by fs.FileSystem interface in Hadoop. I tried to write some portions of the filesystem (for my external server) so that it recompiles successfully but when I submit a MR job I get the following error:
13/03/26 06:09:10 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54312<http://127.0.0.1:54312/>. Already tried 0 time(s). 13/03/26 06:09:11 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54312<http://127.0.0.1:54312/>. Already tried 1 time(s). 13/03/26 06:09:12 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54312<http://127.0.0.1:54312/>. Already tried 2 time(s). 13/03/26 06:09:13 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54312<http://127.0.0.1:54312/>. Already tried 3 time(s). 13/03/26 06:09:14 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54312<http://127.0.0.1:54312/>. Already tried 4 time(s). 13/03/26 06:09:15 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54312<http://127.0.0.1:54312/>. Already tried 5 time(s). 13/03/26 06:09:16 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54312<http://127.0.0.1:54312/>. Already tried 6 time(s). 13/03/26 06:09:17 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54312<http://127.0.0.1:54312/>. Already tried 7 time(s). 13/03/26 06:09:18 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54312<http://127.0.0.1:54312/>. Already tried 8 time(s). 13/03/26 06:09:19 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54312<http://127.0.0.1:54312/>. Already tried 9 time(s). 13/03/26 06:10:20 ERROR security.UserGroupInformation: PriviledgedActionException as:nikhil cause:java.net.ConnectException: Call to localhost/127.0.0.1:54312<http://127.0.0.1:54312/> failed on connection exception: java.net.ConnectException: Connection refused java.net.ConnectException: Call to localhost/127.0.0.1:54312<http://127.0.0.1:54312/> failed on connection exception: java.net.ConnectException: Connection refused at org.apache.hadoop.ipc.Client.wrapException(Client.java:1099) at org.apache.hadoop.ipc.Client.call(Client.java:1075) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at org.apache.hadoop.mapred.$Proxy2.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379) at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:480) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:474) at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:457) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:513) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapreduce.Job.connect(Job.java:511) at org.apache.hadoop.mapreduce.Job.submit(Job.java:499) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530) at org.apache.hadoop.examples.WordCount.main(WordCount.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1206) at org.apache.hadoop.ipc.Client.call(Client.java:1050) ... 27 more Basically, my job tracker is running at localhost:54312 and I have set the value of fs.default.name parameter as myexternalserver://ip:port and fs.myexternalserver.impl as the filesystem class which I made. I am not able to figure out why this error is there. Why is it trying to connect to localhost:54312. Please suggest where am I going wrong. Also, if you feel cascading would be better for this then please do let me know. Thanks & Regards, Nikhil From: Agarwal, Nikhil Sent: Tuesday, March 26, 2013 2:49 PM To: '[email protected]' Subject: How to tell my Hadoop cluster to read data from an external server Hi, I have a Hadoop cluster up and running. I want to submit an MR job to it but the input data is kept on an external server (outside the hadoop cluster). Can anyone please suggest how do I tell my hadoop cluster to load the input data from the external servers and then do a MR on it ? Thanks & Regards, Nikhil
