Why does not occur under a replica?

2014-10-05 Thread cho ju il
hadoop 2.4.1 Datanode disk failure. But does not occur Under-Replicated Blocks. Why does not occur Under-Replicated Blocks? If I restart the namenode(ha), Under-Replicated Blocks occurs. namenode logs org.apache.hadoop.hdfs.server.namenode.NameNode: Disk error on DatanodeRegistrati

Reduce fails always

2014-10-05 Thread Abdul Navaz
Hi All, I am running sample word count job in a 9 node cluster and I am getting the below error message. hadoop jar chiu-wordcount2.jar WordCount /user/hduser/getty/file1.txt /user/hduser/getty/out10 -D mapred.reduce.tasks=2 14/10/05 18:08:45 INFO mapred.JobClient: map 99% reduce 26% 14/10/0

Re: Reduce phase of wordcount

2014-10-05 Thread Ulul
Hi You indicate that you have just one reducer, which is the default in Hadoop 1 but quite insufficient for a 7 slave nodes cluster. You should increase mapred.reduce.tasks use combiners and maybe tune mapred.reduce.tasktracker.reduce.tasks.maximum Hope that helps Ulul Le 05/10/2014 16:53, R

Re: InputFormat for dealing with log files.

2014-10-05 Thread Guillermo Ortiz
Thank you, I didn't know it. I have been looking for some benchmarks joni vs java (defauld package), do you know some web with results? Anyway, I'll try for myself tomorrow. - Mensaje original - De: "Ted Yu" Para: "common-u...@hadoop.apache.org" Enviados: Domingo, 5 de Octubre 2014

Re: datanode down, disk replaced , /etc/fstab changed. Can't bring it back up. Missing lock file?

2014-10-05 Thread Colin Kincaid Williams
I could find no lockfile on the datanode, in any of the data dirs... Therefore I cannot try "the suggested fix" On Fri, Oct 3, 2014 at 9:14 PM, Pradeep Gollakota wrote: > Looks like you're facing the same problem as this SO. > http://stackoverflow.com/questions/10705140/hadoop-datanode-fails-to-

Re: Hadoop and RAID 5

2014-10-05 Thread Ulul
Hi Travis Thank you for your detailed answer and for honoring my question with a blog entry :-) I will look into bus quiescing with admins but I'm under the impression that nothing special is done, the HW RAID controller taking care of everything, HP doc stating that inserting a hot-pluggabl

Re: InputFormat for dealing with log files.

2014-10-05 Thread Ted Yu
Regex processing is not that slow - when adopting best practices. This project provides better performance compared to that of Java's: https://github.com/jruby/joni Cheers On Sun, Oct 5, 2014 at 1:18 PM, Guillermo Ortiz wrote: > I thought something like that,, but I guess it should be a little

Re: InputFormat for dealing with log files.

2014-10-05 Thread Guillermo Ortiz
I thought something like that,, but I guess it should be a little more complex because it should look for a pattern, maybe a date format? An idea it's if you know that the first 10 digits are the date, you could get them and try to match with a date format or something more generic like a RE, al

distcp between clusters with different linux accounts

2014-10-05 Thread Libo Yu
Hi all, I have two hadoop clusters but they are created under different Linux user accounts. Now if I want to move some files between the two clusters, distcp will fail with access exception. That is because the two clusters are under different linux user accounts. Is there a way to get around

Re: Reduce phase of wordcount

2014-10-05 Thread Renato Moutinho
Hi there, thanks a lot for taking the time to answer me ! Actually, this "issue" happens after all the map tasks have completed (I'm looking at the web interface). I'll try to diagnose if it's an issue with the number of threads.. I suppose I'll have to change the logging configuration to fin

Re: InputFormat for dealing with log files.

2014-10-05 Thread Ted Yu
Have you read http://blog.rguha.net/?p=293 ? Cheers On Sun, Oct 5, 2014 at 6:24 AM, Guillermo Ortiz wrote: > > I'd like to know if there's an InputFormat to be able to deal with log > files. The problem that I have it's that if I have to read an Tomcat log > for example, sometimes the exception

RE: How to get the max number of reducers in Yarn

2014-10-05 Thread java8964
You should setNumberReducerTask in your job, just there is no such max reducer count in the Yarn any more. Setting reducer count is kind of art, instead of science. I think there is only one rule about it, don't set the reducer number larger than the reducer input group count. Set the reducer nu

RE: Reduce phase of wordcount

2014-10-05 Thread java8964
Don't be confused by 6.03 MB/s. The relationship between mapper and reducer is M to N relationship, which means the mapper could send its data to all reducers, and one reducer could receive its input from all mappers. There could be a lot of reasons why you think the reduce copying phase is too

InputFormat for dealing with log files.

2014-10-05 Thread Guillermo Ortiz
I'd like to know if there's an InputFormat to be able to deal with log files. The problem that I have it's that if I have to read an Tomcat log for example, sometimes the exceptions are typed on several lines, but they should be processed just like one line, I mean all the lines together to the

Re: How to get the max number of reducers in Yarn

2014-10-05 Thread Guillermo Ortiz
Thanks for all your answers. So, if I don't ask for any concrete number of reduce, and I don't call setNumberReduceTask, how many reduces would I get?? the default value?? If I want to get the maximum number of reducers possible on any time, should I just set the number to maximum integer and