Re: unique list of columns

Christopher Sat, 22 Feb 2014 19:13:17 -0800

I can't help but wonder if maybe the problem you're trying to solve
could be done in a different way (like, when your RFiles are
generated). What kinds of things are your trying to do with the
enumeration of columns? Because, if you're trying to do something like
show these in a drop-down box in a web interface or something, these
could potentially be quite exhaustive... too big for even one machine
to handle, in the general case. Except in very specific use cases, I
can't imagine enumerating every column would be very useful. Perhaps
yours is such a use case, but I wonder...


--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Sat, Feb 22, 2014 at 3:32 PM, Arshak Navruzyan <[email protected]> wrote:
> Mike,
>
> Thanks; this sounds promising.
>
> Arshak
>
> On Feb 22, 2014 11:48 AM, "Mike Drob" <[email protected]> wrote:
>>
>> There's not a single good way that I am aware of, but there are a couple
>> ways that will get you close.
>>
>> First, you can use the SortedKeyIterator to truncate values and
>> potentially save yourself a lot of data transfer.
>> Second, each RFile header block will track the columns contained, up to
>> 1000 (possibly configurable). Check out PrintInfo[1].
>>
>> Mike
>>
>> [1]:
>> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/file/rfile/PrintInfo.java
>>
>>
>> On Sat, Feb 22, 2014 at 11:25 AM, Arshak Navruzyan <[email protected]>
>> wrote:
>>>
>>> I don't know the inner workings of the Rfiles enough but I was wondering
>>> if there is a faster way to get a unique list of columns in Accumulo (short
>>> of doing a full mapreduce).  Is there some way to skip ahead all the volumes
>>> and just get to the next column?
>>>
>>> Thanks
>>
>>
>

Re: unique list of columns

Reply via email to