Well, encoding it might save space, but strings are nice and human-readable, especially in the shell, and in the overall scheme of things, a string probably isn't really that much larger on disk, especially after compression.
-- Christopher L Tubbs II http://gravatar.com/ctubbsii On Mon, May 13, 2013 at 6:09 PM, Mike Hugo <[email protected]> wrote: > I've been playing around with the LongCombiner on a table that's summing up > the counts of output of a MapReduce job, very similar to the WordCount > example from the user manual. > > I started out encoding the values using LongCombiner.FIXED_LEN_ENCODER, but > have noticed that this can lead to some confusion later on downstream. For > example, a co-worker was scanning using the shell and was caught off guard > by the encoded values. Also, out of the box, the StatsCombiner example > works using String values, not Long values so we built a custom piece to > essentially do the same thing with Long values instead. > > It looks to me like most of the examples I've seen just store things are > String values, rather than encoding them. What are the tradeoffs? We're at > a point where we could pretty easily switch things to just use strings - it > seems like that might make things more convenient from a maintenance > perspective (human readable values) and would allow us to re-use some > existing components (e.g. StatsCombiner). Any thoughts? > > Thanks, > > Mike
