Unsubscribe
On Sat, Mar 26, 2022, 10:13 Ravikumar Govindarajan < ravikumar.govindara...@gmail.com> wrote: >
Unsubscribe
unsubscribe
Using NlineInputFormat in HIve
Hi, I have a requirement where i have to send one line to one mapper in a file but doing it using hive. How can we implement the functionality of NLineInptFormat in Hive. I couldnt find this and i tried the following configuration in hive set hive.merge.mapfiles=false; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set mapreduce.input.fileinputformat.split.maxsize=100; I have a file with onlt 1 rows but i need each row to go different mapper But if i used the above configuration in hive some of the rows are missing based on the max split size i provide Thanks,
Definite APi for Vidoe Processing
Hi, I am able to find that we have definite API for processing iages in hadoop using HIPI. Why dont we have the same for videos? Thanks, Subbu
Definite APi for Vidoe Processing
Hi, I am able to find that we have definite API for processing iages in hadoop using HIPI. Why dont we have the same for videos? Thanks, Subbu
Sending the entire file content as value to the mapper
Hi Team, I have a file which has semi structured text data with no definite start and end points. How can i send the entire content of the file at once as key or value to the mapper instead of line by line. Thanks, Subbu
Sending the entire file content as value to the mapper
Hi Team, I have a file which has semi structured text data with no definite start and end points. How can i send the entire content of the file at once as key or value to the mapper instead of line by line. Thanks, Subbu
Output Directory not getting created
Hi, There is a production cluster which has the MapR installed with hadoop in User A. I am trying to run a hadoop job through another User B. My Job is unable to create output in the filesystem under user B. with the following error 13/07/03 09:34:00 INFO mapred.FileInputFormat: Total input paths to process : 2 13/07/03 09:34:00 INFO mapred.JobClient: Creating job's output directory at maprfs:/user/B/obfl/PQRPT_OBFL_UPTIME_F 13/07/03 09:34:00 INFO mapred.JobClient: Creating job's user history location directory at maprfs:/user/B/obfl/PQRPT_OBFL_UPTIME_F/_logs 2013-07-03 09:34:16,0527 ERROR Client fs/client/fileclient/cc/client.cc:852 Thread: 140084677048064 Rmdir failed for dir _logs, error Permission denied(13) 2013-07-03 09:34:16,0527 ERROR Client fs/client/fileclient/cc/client.cc:925 Thread: 140084677048064 Rmdirs failed for dir/file _logs, rpc error 13 2013-07-03 09:34:16,0527 ERROR JniCommon fs/client/fileclient/cc/jni_common.cc:1293 Thread: 140084677048064 remove: File /user/B/obfl/PQRPT_OBFL_UPTIME_F, rpc error, Permission denied(13) 2013-07-03 09:34:16,0590 ERROR Client fs/client/fileclient/cc/client.cc:852 Thread: 140084677048064 Rmdir failed for dir _logs, error Permission denied(13) 2013-07-03 09:34:16,0590 ERROR Client fs/client/fileclient/cc/client.cc:925 Thread: 140084677048064 Rmdirs failed for dir/file _logs, rpc error 13 2013-07-03 09:34:16,0590 ERROR JniCommon fs/client/fileclient/cc/jni_common.cc:1293 Thread: 140084677048064 remove: File /user/B/obfl/PQRPT_OBFL_LOG_D, rpc error, Permission denied(13) 2013-07-03 09:34:16,0749 ERROR Client fs/client/fileclient/cc/client.cc:852 Thread: 140084677048064 Rmdir failed for dir _logs, error Permission denied(13) 2013-07-03 09:34:16,0749 ERROR Client fs/client/fileclient/cc/client.cc:925 Thread: 140084677048064 Rmdirs failed for dir/file _logs, rpc error 13 2013-07-03 09:34:16,0749 ERROR JniCommon fs/client/fileclient/cc/jni_common.cc:1293 Thread: 140084677048064 remove: File /user/B/obfl/PQRPT_OBFL_UPDATIME_AGG, rpc error, Permission denied(13) Thanks, Subbu
Output Directory not getting created
Hi, There is a production cluster which has the MapR installed with hadoop in User A. I am trying to run a hadoop job through another User B. My Job is unable to create output in the filesystem under user B. with the following error 13/07/03 09:34:00 INFO mapred.FileInputFormat: Total input paths to process : 2 13/07/03 09:34:00 INFO mapred.JobClient: Creating job's output directory at maprfs:/user/B/obfl/PQRPT_OBFL_UPTIME_F 13/07/03 09:34:00 INFO mapred.JobClient: Creating job's user history location directory at maprfs:/user/B/obfl/PQRPT_OBFL_UPTIME_F/_logs 2013-07-03 09:34:16,0527 ERROR Client fs/client/fileclient/cc/client.cc:852 Thread: 140084677048064 Rmdir failed for dir _logs, error Permission denied(13) 2013-07-03 09:34:16,0527 ERROR Client fs/client/fileclient/cc/client.cc:925 Thread: 140084677048064 Rmdirs failed for dir/file _logs, rpc error 13 2013-07-03 09:34:16,0527 ERROR JniCommon fs/client/fileclient/cc/jni_common.cc:1293 Thread: 140084677048064 remove: File /user/B/obfl/PQRPT_OBFL_UPTIME_F, rpc error, Permission denied(13) 2013-07-03 09:34:16,0590 ERROR Client fs/client/fileclient/cc/client.cc:852 Thread: 140084677048064 Rmdir failed for dir _logs, error Permission denied(13) 2013-07-03 09:34:16,0590 ERROR Client fs/client/fileclient/cc/client.cc:925 Thread: 140084677048064 Rmdirs failed for dir/file _logs, rpc error 13 2013-07-03 09:34:16,0590 ERROR JniCommon fs/client/fileclient/cc/jni_common.cc:1293 Thread: 140084677048064 remove: File /user/B/obfl/PQRPT_OBFL_LOG_D, rpc error, Permission denied(13) 2013-07-03 09:34:16,0749 ERROR Client fs/client/fileclient/cc/client.cc:852 Thread: 140084677048064 Rmdir failed for dir _logs, error Permission denied(13) 2013-07-03 09:34:16,0749 ERROR Client fs/client/fileclient/cc/client.cc:925 Thread: 140084677048064 Rmdirs failed for dir/file _logs, rpc error 13 2013-07-03 09:34:16,0749 ERROR JniCommon fs/client/fileclient/cc/jni_common.cc:1293 Thread: 140084677048064 remove: File /user/B/obfl/PQRPT_OBFL_UPDATIME_AGG, rpc error, Permission denied(13) Thanks, Subbu
Counters across all jobs
Hi, I have around 4 jobs running in a controller. How can i have a single unique counter present in all the jobs and incremented where ever used in a job? For example:Consider a counter ACount. If job1 is incrementing the counter by2 and job3 by 5 and job 4 by 6. Can i have the counter displayed output in the jobtracker as job1:2 job2:2 job3:7 job4:13 Thanks, Subbu
Re: Operations after killing a job
I prefer not to use Oozie if possible. What I want to know is where does the control reach in the Hadoop classes if I keep a job explicitly. To say can I know which class gets the control if possible Thanks, Subbu On Sunday, July 1, 2012, Harsh J wrote: A framework like Oozie can help you do what you need here. Take a look at its workflow running/management features: http://incubator.apache.org/oozie/ Is this what you're looking for? On Sat, Jun 30, 2012 at 3:30 PM, Kasi Subrahmanyam kasisubbu...@gmail.com javascript:; wrote: Hi, If i have few jobs added to a controller, and i explicitly killed a job in between (assuming all the other jobs failed due to dependency). Can i have the control back to perform some operations after that or is there any API to do that. Please correct me if i question is wrong to begin with? Thanks in advance Subbu -- Harsh J
Operations after killing a job
I prefer not to use Oozie if possible. What I want to know is where does the control reach in the Hadoop classes if I keep a job explicitly. On Sunday, July 1, 2012, Harsh J wrote: A framework like Oozie can help you do what you need here. Take a look at its workflow running/management features: http://incubator.apache.org/oozie/ Is this what you're looking for? On Sat, Jun 30, 2012 at 3:30 PM, Kasi Subrahmanyam kasisubbu...@gmail.com wrote: Hi, If i have few jobs added to a controller, and i explicitly killed a job in between (assuming all the other jobs failed due to dependency). Can i have the control back to perform some operations after that or is there any API to do that. Please correct me if i question is wrong to begin with? Thanks in advance Subbu -- Harsh J
Re: Passing key-value pairs between chained jobs?
HI MIcheal, The problem for the second question can be solved if you use the SequenceFileOutputFormat for the first job output and the SequenceFileInputFormat for the second job input. On Thu, Jun 14, 2012 at 11:11 PM, Michael Parker michael.g.par...@gmail.com wrote: Hi all, One more question. I have two jobs to run serially using a JobControl. The key-value types for the outputs of the reducer of the first job are ActiveDayKey, Text, where ActiveDayKey is a class that implements WritableComparable. And so the key-value types for the inputs to the mapper of the second job are ActiveDayKey, Text. I'm noticing two things: First, in the output of the reducer from the first job, each ActiveDayKey object is being written as a string using its toString method. Since it's a subclass of WritableComparable that already knows how to serialize itself using write(DataOuptut), is there any way to exploit that to write it in binary format? Otherwise, do I need to write a subclass of FileOutputFormat? Second, the second job fails with java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to co.adhoclabs.LogProcessor$ActiveDayKey. I'm assuming this is because by default the key type is Long for the line number, and here I want to ignore the line number and use the ActiveDayKey written on the line itself as the key. Again, since ActiveDayKey knows how to deserialize itself using readFields(DataInput), is there any way to exploit that to read it from the line in binary format? Do I need to write a subclass of FileInputFormat? Assuming I need to write subclasses of FileOutputFormat and FileInputFormat classes, what's a good example of this? The terasort example? Thanks, Mike
Re: Consistent Checksum error using SequenceFileInputFormat against segment/content segment/parse_text folders output by Nutch.
HI Ali, I also faced this error when i ran the jobs either in local or in a cluster. I am able to solve this problem by removing the .crc file created in the input folder for this job. Please check that there is no .crc file in the input. I hope this solves the problem. Thanks, Subbu On Wed, May 9, 2012 at 1:31 PM, Ali Safdar Kureishy safdar.kurei...@gmail.com wrote: Hi, I've included both the Nutch and Hadoop mailing lists, since I don't know which one of the two is the root cause for this issue, and it might be possible to pursue a resolution from both sides. What I'm trying to do is to dump the contents of all the fetched pages from my nutch crawl -- about 600K of them. I've tried extracting this information initially from the *segment/parse_text* folder, but I kept receiving the error below, so I switched over to the *segment/content *folder, but BOTH of these *consistently *give me the following Checksum Error exception which fails the map-reduce job. At the very least I'm hoping to get some tip(s) on how to ignore this error and let my job complete. *org.apache.hadoop.fs.ChecksumException: Checksum Error at org.apache.hadoop.mapred.IFileInputStream.doRead(IFileInputStream.java:164) at org.apache.hadoop.mapred.IFileInputStream.read(IFileInputStream.java:101) at org.apache.hadoop.mapred.IFile$Reader.readData(IFile.java:328) at org.apache.hadoop.mapred.IFile$Reader.rejigData(IFile.java:358) at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:342) at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:404) at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220) at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:330) at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:499) at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381) at org.apache.hadoop.mapred.Merger.merge(Merger.java:77) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1522) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1154) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:359) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) * I'm using the *SequenceFileInputFormat* to read the data in each case. I have also attached the Hadoop output (checksum-error.txt). I have no idea how to ignore this error or to debug it. I've tried setting the boolean *io.skip.checksum.errors* property to *true* on the MapReduce Conf object, but it makes no difference. The error still happens consistently, so it seems like I'm either not setting the right property, or that it is being ignored by Hadoop? Since the error is thrown down in the internals of Hadoop, there doesn't seem to be any other way to ignore the error either, without changing Hadoop code (that I'm not able to do at this point). Is this a problem with the data that was output by Nutch? Or is this a bug with Hadoop? *Btw, I ran Nutch in local mode (without hadoop), and I'm running the Hadoop job (below) purely as an application from Eclipse (not via the bin/hadoop script).* Any help or pointers on how to dig further with this would be greatly appreciated. If there is any other way for me to ignore these checksum errors and let the job complete, do please share that with me as well. Here is the code for the reader job using MapReduce: package org.q.alt.sc.nutch.readerjobs; import java.io.IOException; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.SequenceFileInputFormat; import org.apache.hadoop.mapred.TextOutputFormat; import org.apache.hadoop.mapred.lib.IdentityReducer; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; import org.apache.nutch.protocol.Content; public class SegmentContentReader extends Configured implements Tool { /** * @param args */ public static void main(String[] args) throws Exception { int exitCode = ToolRunner.run(new SegmentContentReader(), args); System.exit(exitCode); } @Override public int run(String[] args) throws Exception { if (args.length != 2) { System.out.printf( Usage: %s [generic
Re: Getting filename in case of MultipleInputs
Yeah Jim, I have gone through the comments in that JIRA ticket and am able to solve my problem On Sat, May 5, 2012 at 11:25 PM, Jim Donofrio donofrio...@gmail.com wrote: There is already a JIRA for this: MAPREDUCE-1743 On 05/03/2012 09:06 AM, Harsh J wrote: Subbu, The only way I can think of, is to use an overridden InputFormat/RecordReader pair that sets the map.input.file config value during its initialization, using the received FileSplit object. This should be considered as a bug, however, and even 2.x is affected. Can you please file a JIRA on https://issues.apache.org/**jira/browse/MAPREDUCEhttps://issues.apache.org/jira/browse/MAPREDUCEand post back the ID on this thread for posterity? On Thu, May 3, 2012 at 6:25 PM, Kasi Subrahmanyam kasisubbu...@gmail.com wrote: Hi, Could anyone suggest how to get the filename in the mapper. I have gone through the JIRA ticket that map.input.file doesnt work in case of multiple inputs,TaggedInputSplit also doesnt work in case of 0.20.2 version as it is not a public class. I tried to find any other approach than this but i could find none in the search Could anyone suggest a solution other tan these Thanks in advance; Subbu. Thanks,
Re: Cleanup after a Job
Hi arun, I can see that the output commiter is present in the reducer. How to make sure thtat this commiter happens at the end of the job or does it run by default at the end of the job. I can have more than one reducer tasks. On Sun, Apr 29, 2012 at 11:28 PM, Arun C Murthy a...@hortonworks.com wrote: Use OutputCommitter.(abortJob, commitJob): http://hadoop.apache.org/common/docs/r1.0.2/api/org/apache/hadoop/mapred/OutputCommitter.html Arun On Apr 26, 2012, at 4:44 PM, kasi subrahmanyam wrote: Hi I have few jobs added to a Job controller . I need a afterJob() to be executed after the completion of s Job. For example Here i am actually overriding the Job of JobControl. I have Job2 depending on the output of Job1.This input for Job2is obtained after doing some File System operations on the output of the Job1.This operation should happen in a afterJob( ) method while is available for each Job.How do i make sure that afterJob () method is called for each Job added to the controller before running the jobs that are depending on it. Thanks -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
Re: Cleanup after a Job
Hi Robert, Could you provied me the exact method of the JobControl Job or jobConf which calls the commitJob method Thanks On Tue, May 1, 2012 at 7:36 PM, Robert Evans ev...@yahoo-inc.com wrote: Either abortJob or commitJob will be called for all jobs. AbortJob will be called if the job has failed. CommitJob will be called if it succeeded. The purpose of these are to commit the output of the map/reduce job and cleanup any temporary files/data that might be lying around. CommitTask/abortTask is similar, and is called for each individual task. --Bobby Evans On 5/1/12 8:32 AM, kasi subrahmanyam kasisubbu...@gmail.com wrote: Hi arun, I can see that the output commiter is present in the reducer. How to make sure thtat this commiter happens at the end of the job or does it run by default at the end of the job. I can have more than one reducer tasks. On Sun, Apr 29, 2012 at 11:28 PM, Arun C Murthy a...@hortonworks.com wrote: Use OutputCommitter.(abortJob, commitJob): http://hadoop.apache.org/common/docs/r1.0.2/api/org/apache/hadoop/mapred/OutputCommitter.html Arun On Apr 26, 2012, at 4:44 PM, kasi subrahmanyam wrote: Hi I have few jobs added to a Job controller . I need a afterJob() to be executed after the completion of s Job. For example Here i am actually overriding the Job of JobControl. I have Job2 depending on the output of Job1.This input for Job2is obtained after doing some File System operations on the output of the Job1.This operation should happen in a afterJob( ) method while is available for each Job.How do i make sure that afterJob () method is called for each Job added to the controller before running the jobs that are depending on it. Thanks -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
Re: cygwin single node setup
Hi Onder, You could try to format the namenode and restart the daemons, That solved my problem most number of times. May be the running daemons where not able to pickup the all the datanodes configurations On Sat, Apr 28, 2012 at 4:23 PM, Onder SEZGIN ondersez...@gmail.com wrote: Hi, I am pretty a newbie and i am following the quick start guide for single node set up on windows using cygwin. In this step, $ bin/hadoop fs -put conf input I am getting the following errors. I have got no files under /user/EXT0125622/input/conf/capacity-scheduler.xml. That might be a reason for the errors i get but why does hadoop look for such directory as i have not configured anything like that. so supposably, hadoop is making up and looking for such file and directory? Any idea and help is welcome. Cheers Onder 12/04/27 13:44:37 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/EXT0125622/input/conf/capacity-scheduler.xml could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) at org.apache.hadoop.ipc.Client.call(Client.java:1066) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at $Proxy1.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy1.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3507) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3370) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2700(DFSClient.java:2586) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2826) 12/04/27 13:44:37 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null 12/04/27 13:44:37 WARN hdfs.DFSClient: Could not get block locations. Source file /user/EXT0125622/input/conf/capacity-scheduler.xml - Aborting... put: java.io.IOException: File /user/EXT0125622/input/conf/capacity-scheduler.xml could only be replicated to 0 nodes, instead of 1 12/04/27 13:44:37 ERROR hdfs.DFSClient: Exception closing file /user/EXT0125622/input/conf/capacity-scheduler.xml : org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/EXT0125622/input/conf/capacity-scheduler.xml could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/EXT0125622/input/conf/capacity-scheduler.xml could only be replicated to 0 nodes, instead of 1 at
Re: Unable to set the heap size on Amazon elastic mapreduce
Hi Shirsh, You need to increase the heap size of the JVM for the changes you made for mapred.child to take effect These changes need to be made in the files of the hadoop configuration folder. In hadoop-env.sh increase the heap size to may be 2000.(I think this property by default will be commented.) Then in the mapred-default.xml change( in the src folder) overwrite the other properties like mapred.map.child.java.opts. I hope this helps. On Thu, Apr 5, 2012 at 1:26 PM, shrish bajpai random_unk...@hotmail.co.ukwrote: Hi, I have tried the following combinations of bootstrap actions to increase the heap size of my job but none of them seem to work: --mapred-key-value mapred.child.java.opts=-Xmx1024m --mapred-key-value mapred.child.ulimit=unlimited --mapred-key-value mapred.map.child.java.opts=-Xmx1024m --mapred-key-value mapred.map.child.ulimit=unlimited -m mapred.map.child.java.opts=-Xmx1024m -m mapred.map.child.ulimit=unlimited -m mapred.child.java.opts=-Xmx1024m -m mapred.child.ulimit=unlimited What is the right syntax? Thanks Shrish
Re: getting NullPointerException while running Word cont example
Hi Sujit, I think it is a problem with the host names configuration. Could you please check whether you added the host names of the master and the slaves in the etc/hosts file of all the nodes. On Mon, Apr 2, 2012 at 8:00 PM, Sujit Dhamale sujitdhamal...@gmail.comwrote: Can some one please look in to below issue ?? Thanks in Advance On Wed, Mar 7, 2012 at 9:09 AM, Sujit Dhamale sujitdhamal...@gmail.com wrote: Hadoop version : hadoop-0.20.203.0rc1.tar Operaring Syatem : ubuntu 11.10 On Wed, Mar 7, 2012 at 12:19 AM, Harsh J ha...@cloudera.com wrote: Hi Sujit, Please also tell us which version/distribution of Hadoop is this? On Tue, Mar 6, 2012 at 11:27 PM, Sujit Dhamale sujitdhamal...@gmail.com wrote: Hi, I am new to Hadoop., i install Hadoop as per http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluste while running Word cont example i am getting *NullPointerException *can some one please look in to this issue ?* *Thanks in Advance* !!! * duser@sujit:~/Desktop/hadoop$ bin/hadoop dfs -ls /user/hduser/data Found 3 items -rw-r--r-- 1 hduser supergroup 674566 2012-03-06 23:04 /user/hduser/data/pg20417.txt -rw-r--r-- 1 hduser supergroup1573150 2012-03-06 23:04 /user/hduser/data/pg4300.txt -rw-r--r-- 1 hduser supergroup1423801 2012-03-06 23:04 /user/hduser/data/pg5000.txt hduser@sujit:~/Desktop/hadoop$ bin/hadoop jar hadoop*examples*.jar wordcount /user/hduser/data /user/hduser/gutenberg-outputd 12/03/06 23:14:33 INFO input.FileInputFormat: Total input paths to process : 3 12/03/06 23:14:33 INFO mapred.JobClient: Running job: job_201203062221_0002 12/03/06 23:14:34 INFO mapred.JobClient: map 0% reduce 0% 12/03/06 23:14:49 INFO mapred.JobClient: map 66% reduce 0% 12/03/06 23:14:55 INFO mapred.JobClient: map 100% reduce 0% 12/03/06 23:14:58 INFO mapred.JobClient: Task Id : attempt_201203062221_0002_r_00_0, Status : FAILED Error: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2900) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2820) 12/03/06 23:15:07 INFO mapred.JobClient: Task Id : attempt_201203062221_0002_r_00_1, Status : FAILED Error: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2900) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2820) 12/03/06 23:15:16 INFO mapred.JobClient: Task Id : attempt_201203062221_0002_r_00_2, Status : FAILED Error: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2900) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2820) 12/03/06 23:15:31 INFO mapred.JobClient: Job complete: job_201203062221_0002 12/03/06 23:15:31 INFO mapred.JobClient: Counters: 20 12/03/06 23:15:31 INFO mapred.JobClient: Job Counters 12/03/06 23:15:31 INFO mapred.JobClient: Launched reduce tasks=4 12/03/06 23:15:31 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=22084 12/03/06 23:15:31 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/03/06 23:15:31 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/03/06 23:15:31 INFO mapred.JobClient: Launched map tasks=3 12/03/06 23:15:31 INFO mapred.JobClient: Data-local map tasks=3 12/03/06 23:15:31 INFO mapred.JobClient: Failed reduce tasks=1 12/03/06 23:15:31 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=16799 12/03/06 23:15:31 INFO mapred.JobClient: FileSystemCounters 12/03/06 23:15:31 INFO mapred.JobClient: FILE_BYTES_READ=740520 12/03/06 23:15:31 INFO mapred.JobClient: HDFS_BYTES_READ=3671863 12/03/06 23:15:31 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2278287 12/03/06 23:15:31 INFO mapred.JobClient: File Input Format Counters 12/03/06 23:15:31 INFO mapred.JobClient: Bytes Read=3671517 12/03/06 23:15:31 INFO mapred.JobClient: Map-Reduce Framework 12/03/06 23:15:31 INFO mapred.JobClient: Map output materialized bytes=1474341 12/03/06 23:15:31 INFO mapred.JobClient: Combine output records=102322 12/03/06
Re: Read key and values from HDFS
Hi Pedro i am not sure we have a single method for reading the data in output files for different otutput formats. But for sequence files we can use SequenceFile.Reader class in the API to read the sequence files. On Fri, Mar 30, 2012 at 10:49 PM, Pedro Costa psdc1...@gmail.com wrote: The ReduceTask can save the file using several output format: InternalFileOutputFormant, SequenceFileOutputFormat, TeraOuputFormat, etc... How can I read the keys and the values from the output file? Can anyone give me an example? Is there a way to create just one method that can read all different outputformat? Thanks, -- Best regards,
Re: start hadoop slave over WAN
Try checking the logs in the logs folder for the datanode.It might give some lead. Maybe there is a mismatch between the namespace iDs in the system and user itself while starting the datanode. On Fri, Mar 30, 2012 at 10:32 PM, Ben Cuthbert bencuthb...@ymail.comwrote: All We have a master in one region and we are trying to start a slave datanode in another region. When executing the scripts it looks to login to the remote host, but never starts the datanode. When executing hbase tho it does work. Is there a timeout or something with hadoop?
Re: Job tracker service start issue.
Hi Oliver, I am not sure my suggestion might solve your problem or it might be already solved on your side. It seems the task tracker is having a problem accessing the tmp directory. Try going to the core and mapred site xml and change the tmp directory to a new one. If this is not yet working then manually change the permissions of theat directory using : chmod -R 777 tmp On Fri, Mar 23, 2012 at 3:33 PM, Olivier Sallou olivier.sal...@irisa.frwrote: Le 3/23/12 8:50 AM, Manish Bhoge a écrit : I have Hadoop running on Standalone box. When I am starting deamon for namenode, secondarynamenode, job tracker, task tracker and data node, it is starting gracefully. But soon after it start job tracker it doesn't show up job tracker service. when i run 'jps' it is showing me all the services including task tracker except Job Tracker. Is there any time limit that need to set up or is it going into the safe mode. Because when i saw job tracker log this what it is showing, looks like it is starting the namenode but soon after it shutdown: 2012-03-22 23:26:04,061 INFO org.apache.hadoop.mapred.JobTracker: STARTUP_MSG: / STARTUP_MSG: Starting JobTracker STARTUP_MSG: host = manish/10.131.18.119 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2-cdh3u3 STARTUP_MSG: build = file:///data/1/tmp/nightly_2012-02-16_09-46-24_3/hadoop-0.20-0.20.2+923.195-1~maverick -r 217a3767c48ad11d4632e19a22897677268c40c4; compiled by 'root' on Thu Feb 16 10:22:53 PST 2012 / 2012-03-22 23:26:04,140 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens 2012-03-22 23:26:04,141 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s) 2012-03-22 23:26:04,141 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens 2012-03-22 23:26:04,142 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1) 2012-03-22 23:26:04,143 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2012-03-22 23:26:04,186 INFO org.apache.hadoop.mapred.JobTracker: Starting jobtracker with owner as mapred 2012-03-22 23:26:04,201 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 54311 2012-03-22 23:26:04,203 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=JobTracker, port=54311 2012-03-22 23:26:04,206 INFO org.apache.hadoop.ipc.metrics.RpcDetailedMetrics: Initializing RPC Metrics with hostName=JobTracker, port=54311 2012-03-22 23:26:09,250 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2012-03-22 23:26:09,298 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) 2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50030 2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50030 webServer.getConnectors()[0].getLocalPort() returned 50030 2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50030 2012-03-22 23:26:09,319 INFO org.mortbay.log: jetty-6.1.26.cloudera.1 2012-03-22 23:26:09,517 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50030 2012-03-22 23:26:09,519 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 2012-03-22 23:26:09,519 INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 54311 2012-03-22 23:26:09,519 INFO org.apache.hadoop.mapred.JobTracker: JobTracker webserver: 50030 2012-03-22 23:26:09,648 WARN org.apache.hadoop.mapred.JobTracker: Failed to operate on mapred.system.dir (hdfs://localhost:54310/app/hadoop/tmp/mapred/system) because of permissions. 2012-03-22 23:26:09,648 WARN org.apache.hadoop.mapred.JobTracker: This directory should be owned by the user 'mapred (auth:SIMPLE)' 2012-03-22 23:26:09,650 WARN org.apache.hadoop.mapred.JobTracker: Bailing out ... org.apache.hadoop.security.AccessControlException: The systemdir hdfs://localhost:54310/app/hadoop/tmp/mapred/system is not owned by mapred at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2241) at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2050) at