Hi,

Over at Mahout (http://lucene.apache.org/mahout) we have a Vector interface with two implementations DenseVector and SparseVector. When it comes to writing Mapper/Reducer, we have been able to just use Vector, but when it comes to actually binding real data via a Configuration, we need to specify, I think, the actual implementation being used, as in something like conf.setOutputValueClass(SparseVector.class);

Ideally, we'd like to avoid having to pick a particular implementation to as late as possible. Right now, we've pushed this off to the user to pass in the implementation, but even that is less than ideal for a variety of reasons. While we typically wouldn't expect the data to be a mixture of Dense and Sparse, there really shouldn't be a reason why it can't be. We realize we could write out the class name to the DataOutput (we implement Writable) that causes us to have either hack some String compares in or use Class.forName(), which seems like it wouldn't perform well (although I admit I haven't tested that yet, presumably the JDK can cache the info)

Thanks,
Grant

Reply via email to