If it's just as the value, it's really up to your preference. Since it sounds like you have issues using encoded data as the value for shell users, you can switch to String representations. A possible alternative is using the views we have in the shell (transformations? I don't remember the name, I don't know much about them).
Another concern is you have iterators/combiners running on the values, they need to be aware of the format. But ultimately, the point is that your format really doesn't matter, but it's that you're going to have to be consistent from then on. On Mon, May 13, 2013 at 6:09 PM, Mike Hugo <[email protected]> wrote: > I've been playing around with the LongCombiner on a table that's summing > up the counts of output of a MapReduce job, very similar to the WordCount > example from the user manual. > > I started out encoding the values using LongCombiner.FIXED_LEN_ENCODER, > but have noticed that this can lead to some confusion later on downstream. > For example, a co-worker was scanning using the shell and was caught off > guard by the encoded values. Also, out of the box, the StatsCombiner > example works using String values, not Long values so we built a custom > piece to essentially do the same thing with Long values instead. > > It looks to me like most of the examples I've seen just store things are > String values, rather than encoding them. What are the tradeoffs? We're > at a point where we could pretty easily switch things to just use strings - > it seems like that might make things more convenient from a maintenance > perspective (human readable values) and would allow us to re-use some > existing components (e.g. StatsCombiner). Any thoughts? > > Thanks, > > Mike >
