Any suggestions ?
On Thu, Mar 13, 2014 at 1:07 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <[email protected]> wrote: > Hello, > I have a hadoop cluster upgraded to Hadoop 2.x and everything with it > works fine. (Runs M/R jobs, able to perform actions on HDFS). > > When i run a pig script using pig grunt shell or pig -x mapreduce -f > 'test.pig'. In either case it connects to Hadoop cluster, starts M/R job, > the M/R job completes fine. However the shell hangs and never returns. > > > 2014-03-13 00:17:04,341 [JobControl] INFO org.apache.hadoop.mapreduce.Job > - The url to track the job: > https://apollo-jt.vip.org.com:50030/proxy/application_1394582929977_7433/ > 2014-03-13 00:17:04,342 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - HadoopJobId: job_1394582929977_7433 > 2014-03-13 00:17:04,342 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Processing aliases A,B,C > 2014-03-13 00:17:04,342 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - detailed locations: M: A[4,4],C[6,4],B[5,4] C: C[6,4],B[5,4] R: C[6,4] > 2014-03-13 00:17:04,365 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 0% complete > 2014-03-13 00:17:36,232 [main] INFO org.apache.hadoop.ipc.Client - > Retrying connect to server: <AM_Host_Name>/<AM_Host_IP>:47718. Already > tried 0 time(s); maxRetries=45 > 2014-03-13 00:17:36,232 [main] INFO org.apache.hadoop.ipc.Client - > Retrying connect to server: <AM_Host_Name>/<AM_Host_IP>:47718. Already > tried 1 time(s); maxRetries=45 > 2014-03-13 00:17:36,232 [main] INFO org.apache.hadoop.ipc.Client - > Retrying connect to server: <AM_Host_Name>/<AM_Host_IP>:47718. Already > tried 2 time(s); maxRetries=45 > .. > ... > > > > On analyzing, i found that AM_Host_Name matches the Application Master of > the M/R job. > Question > 1) Does the client machine attempts to connect to Application Master, in > order to get the status of M/R Job ? > 2) If #1 is true, and since its Hadoop 2.x secure cluster, does it mean it > requires firewall to be open between client and application master (any > node in the cluster) and PORT ? > 3) I assumed #1 and #2 are true and hence got the firewall to be opened > between client and all nodes in hadoop cluster (since anyone can be > application master) for port 47718. However to my surprise i found that > this 47718 port changed. > Is there a setting or a group of port numbers that are used to communicate > between client and AM in order to report status ? If yes where can i find > this list ? > > 4) How do i get the grunt shell back and see the status/progress of job > from client machine ? > > > > -- > Deepak > > -- Deepak
