I can't help but wonder if maybe the problem you're trying to solve could be done in a different way (like, when your RFiles are generated). What kinds of things are your trying to do with the enumeration of columns? Because, if you're trying to do something like show these in a drop-down box in a web interface or something, these could potentially be quite exhaustive... too big for even one machine to handle, in the general case. Except in very specific use cases, I can't imagine enumerating every column would be very useful. Perhaps yours is such a use case, but I wonder...
-- Christopher L Tubbs II http://gravatar.com/ctubbsii On Sat, Feb 22, 2014 at 3:32 PM, Arshak Navruzyan <[email protected]> wrote: > Mike, > > Thanks; this sounds promising. > > Arshak > > On Feb 22, 2014 11:48 AM, "Mike Drob" <[email protected]> wrote: >> >> There's not a single good way that I am aware of, but there are a couple >> ways that will get you close. >> >> First, you can use the SortedKeyIterator to truncate values and >> potentially save yourself a lot of data transfer. >> Second, each RFile header block will track the columns contained, up to >> 1000 (possibly configurable). Check out PrintInfo[1]. >> >> Mike >> >> [1]: >> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/file/rfile/PrintInfo.java >> >> >> On Sat, Feb 22, 2014 at 11:25 AM, Arshak Navruzyan <[email protected]> >> wrote: >>> >>> I don't know the inner workings of the Rfiles enough but I was wondering >>> if there is a faster way to get a unique list of columns in Accumulo (short >>> of doing a full mapreduce). Is there some way to skip ahead all the volumes >>> and just get to the next column? >>> >>> Thanks >> >> >
