Hadoop constructing public instance variables?
Hello all, I have my own VersionedWritable class like so: public class MyWritable extends VersionedWritable { public OtherWritable anotherWritable; // instance var, implements Writable ... public MyWritable() { anotherWritable = null; } @Override public void readFields(DataInput in) { if (condition) { anotherWritable = new OtherWritable(); // ... } } } I can verify that the no-arg constructor of MyWritable is being called, but it appears as if the no-arg constructor for OtherWritable is being called some time between the construction of MyWritable and the call to MyWritable's readFields(). Is this expected behavior, because the OtherWritable is a public instance variable that implements Writable? This is happening when a SequenceFile containing a bunch of these is read into a Reducer. Ted
Re: Counters problem...
I can confirm this behavior, Hadoop 0.16.4 I miss the counters. ted Ion Badita wrote: Ion Badita wrote: Hi, I have a problem with counters been updated, after i upgraded my hadoop from 0.15.1 to 0.16.4 and i tried 0.17.0 too. The counters are first updated only after first map task completes. The counters worked well in older version. Any ideas why? Thanks. Ion Hi, I tried to run an example randomwriter with a cluster of 6+1 computers and hadoop 0.17.0. The counters are updated only when a map task finishes. If i continuously refreshing the page from time to time i see counters and on the next refresh the counters are gone, this is happening until a map task finishes Can anyone do a test and confirm? Thanks Ion
Re: Hadoop performance on EC2?
I have seen EC2 be slower than a comparable system in development, but not by the factors that you're experiencing. One thing about EC2 that has concerned me - you are not guaranteed that your /mnt disk is an uncontested spindle. Early on, this was the case, but Amazon made no promises. Also, and this may be a stupid question, are you sure that you're using the same JVM in EC2 and development? GCJ is much slower than Sun's JVM. Ted Nate Carlson wrote: On Thu, 10 Apr 2008, Ted Dunning wrote: Are you trying to read from mySQL? No, we're outputting to MySQL. I've also verified that the MySQL server is hardly seeing any load, isn't waiting on slow queries, etc. If so, it isn't very surprising that you could get lower performance with more readers. Indeed! | nate carlson | [EMAIL PROTECTED] | http://www.natecarlson.com | | depriving some poor village of its idiot since 1981|
Hadoop cluster build, machine specs
Hi all, I'm looking to build a small, 5-10 node cluster to run mostly CPU-bound Hadoop jobs. I'm shying away from the 8-core behemoth type machines for cost reasons. But what about dual core machines? 32 or 64 bits? I'm still in the planning stages, so any advice would be greatly appreciated. Thanks, Ted
Read SequenceFile from C++?
Does anyone have experience reading a SequenceFile from C++? I don't need to write, just read. I have looked at the RecordIO C++ libraries, but can't connect this to a Hadoop InputFormat. I'm using Hadoop 0.14. Any suggestions would be appreciated. Ted