Hadoop constructing public instance variables?

2008-06-17 Thread Ted Dziuba
Hello all,

I have my own VersionedWritable class like so:


public class MyWritable extends VersionedWritable {
  
  public OtherWritable anotherWritable; // instance var, implements Writable
  ...

  public MyWritable() {
anotherWritable = null;
  }

  @Override
  public void readFields(DataInput in) {
if (condition) {
  anotherWritable = new OtherWritable();
  // ...
}
  }
}


I can verify that the no-arg constructor of MyWritable is being called,
but it appears as if the no-arg constructor for OtherWritable is being
called some time between the construction of MyWritable and the call to
MyWritable's readFields().  Is this expected behavior, because the
OtherWritable is a public instance variable that implements Writable?

This is happening when a SequenceFile containing a bunch of these is
read into a Reducer.

Ted


Re: Counters problem...

2008-05-22 Thread Ted Dziuba

I can confirm this behavior, Hadoop 0.16.4

I miss the counters.

ted

Ion Badita wrote:

Ion Badita wrote:

Hi,

I have a problem with counters been updated, after i upgraded my 
hadoop from 0.15.1 to 0.16.4 and i tried 0.17.0 too. The counters are 
first updated only after first map task completes. The counters 
worked well in older version.

Any ideas why?

Thanks.
Ion


Hi,

I tried to run an example randomwriter with a cluster of 6+1 computers 
and hadoop 0.17.0. The counters are updated only when a map task 
finishes. If i continuously refreshing the page from time to time i 
see counters and on the next refresh the counters are gone, this is 
happening until a map task finishes


Can anyone do a test and confirm?

Thanks
Ion




Re: Hadoop performance on EC2?

2008-04-11 Thread Ted Dziuba
I have seen EC2 be slower than a comparable system in development, but 
not by the factors that you're experiencing.  One thing about EC2 that 
has concerned me - you are not guaranteed that your /mnt disk is an 
uncontested spindle.  Early on, this was the case, but Amazon made no 
promises.


Also, and this may be a stupid question, are you sure that you're using 
the same JVM in EC2 and development?  GCJ is much slower than Sun's JVM.


Ted

Nate Carlson wrote:

On Thu, 10 Apr 2008, Ted Dunning wrote:

Are you trying to read from mySQL?


No, we're outputting to MySQL. I've also verified that the MySQL 
server is hardly seeing any load, isn't waiting on slow queries, etc.


If so, it isn't very surprising that you could get lower performance 
with more readers.


Indeed!


| nate carlson | [EMAIL PROTECTED] | http://www.natecarlson.com |
|   depriving some poor village of its idiot since 1981|





Hadoop cluster build, machine specs

2008-04-04 Thread Ted Dziuba

Hi all,

I'm looking to build a small, 5-10 node cluster to run mostly CPU-bound 
Hadoop jobs.  I'm shying away from the 8-core behemoth type machines for 
cost reasons.  But what about dual core machines?  32 or 64 bits?


I'm still in the planning stages, so any advice would be greatly 
appreciated.


Thanks,

Ted


Read SequenceFile from C++?

2008-02-06 Thread Ted Dziuba
Does anyone have experience reading a SequenceFile from C++?  I don't 
need to write, just read.  I have looked at the RecordIO C++ libraries, 
but can't connect this to a Hadoop InputFormat.  I'm using Hadoop 0.14.


Any suggestions would be appreciated.

Ted