Re: error while using ArrayWritable

2010-01-02 Thread bharath v
Thanks J-D , worked perfectly fine ! On Sat, Jan 2, 2010 at 1:16 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: This is explained in the javadoc: http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/io/ArrayWritable.html J-D On Fri, Jan 1, 2010 at 11:29 PM, bharath

I am not able to run this program in distributed mode but I am able to run it in pseudo distributed mode

2010-01-02 Thread Ravi
import java.lang.Integer; import java.util.TreeMap; import java.io.IOException; import java.util.Date; import java.util.Iterator; import java.lang.StringBuilder; import java.util.StringTokenizer; import java.util.Random; import java.lang.String; import java.util.TreeSet; import java.util.HashMap;

Re: I am not able to run this program in distributed mode but I am able to run it in pseudo distributed mode

2010-01-02 Thread Ravi
Someone please go through the code and fix the bug. Thanks in advance. On Sat, Jan 2, 2010 at 10:05 PM, Ravi ravindra.babu.rav...@gmail.comwrote: import java.lang.Integer; import java.util.TreeMap; import java.io.IOException; import java.util.Date; import java.util.Iterator; import

Re: large reducer output with same key

2010-01-02 Thread Jason Venner
I havve only seen that type of error when the tasktracker machine is very heavily loaded and the task does not exit in a timely manner after the tasktracker terminates it. Is this error in your task log or in the tasktracker log? On Fri, Jan 1, 2010 at 3:02 PM, himanshu chandola

Re: I am not able to run this program in distributed mode but I am able to run it in pseudo distributed mode

2010-01-02 Thread Todd Lipcon
http://catb.org/~esr/faqs/smart-questions.html#id383250 Please go through the above explanation of how to ask questions on a mailing list, and repost your question. Thanks in advance. On Sat, Jan 2, 2010 at 8:35 AM, Ravi ravindra.babu.rav...@gmail.com wrote: import java.lang.Integer; import

Re: HDFS read/write speeds, and read optimization

2010-01-02 Thread Stas Oskin
Hi. Can anyone advice on the subject below? Thanks! On Mon, Dec 28, 2009 at 9:01 PM, Stas Oskin stas.os...@gmail.com wrote: Hi. Going back to the subject, has anyone ever bench-marked small (10 - 20 node) HDFS clusters? I did my own speed checks, and it seems I can reach ~77Mbps, on a

Re: HDFS read/write speeds, and read optimization

2010-01-02 Thread Eli Collins
Hey Stas, Can you provide more information about your workload and the environment? eg are you running t.o.a.h.h.BenchmarkThroughput, TestDFSIO, or timing hadoop fs -put/get to transfer data to hdfs from another machine, looking at metrics, etc. What else is running on the cluster? Have you

Re: large reducer output with same key

2010-01-02 Thread himanshu chandola
It is in the tasktracker log. The job is same as before so definitely the machine is not heavily loaded. Seems pretty weird that the data is written at the right mapred.local.dir but not read from there. Morpheus: Do you believe in fate, Neo? Neo: No. Morpheus: Why Not? Neo: Because I don't

Re: HDFS read/write speeds, and read optimization

2010-01-02 Thread Andreas Kostyrka
Well, that all depends on many details, but: -) are you really using 4 discs (configured correctly as data directories?) -) What hdd/connection technology? -) And 77MB/s would match up curiously well with 1Gbit networking cards? So you sure that you are testing a completely local setup? Where's

Small doubt in MR

2010-01-02 Thread bharath v
Hi, I want a particular section of code to run only in any ONE of the mappers . So I employed the following procedure. Main-Class { public boolean flag = true; Map-Class { if(flag) { flag=false; /* section of code

Re: Small doubt in MR

2010-01-02 Thread Mark Kerzner
I think you need some kind of semaphore that you can turn on by the first reducer. For example, allocating a file in HDFS would work - if you could guarantee that it is an atomic operation (create-if-does-not-exist). Mark On Sat, Jan 2, 2010 at 10:04 PM, bharath v

Re: Small doubt in MR

2010-01-02 Thread Matei Zaharia
If you want the code to happen on only one machine, why not run it in your driver program that submits the MapReduce job? You could also create a special input record that tells the mapper who gets that record that it's the chosen one. However, note that that mapper may be run multiple times

Re: Small doubt in MR

2010-01-02 Thread brien colwell
Another approach would be to use a custom InputFormat implementation, with the flag as a property of the input split . Consider wrapping your InputFormat with something like 'InputFormatWithFlag', that returns splits that combine the wrapped InputFormat's splits with your flag. Since