Re: Best Practices Adding Rows

Lars George Tue, 07 Dec 2010 00:54:39 -0800

Hi Alex,

That is indeed the recommended way, i.e. use binary values if you can. As long 
as you can express the same sorting as a long as opposed to a string then 
that's the way to go for sure.


Lars

On Dec 7, 2010, at 8:21, Alex Baranau <[email protected]> wrote:

> I think I've faced by the key format, smth like "<date><hour><smth>" several
> times in the list recently. Which I assume is a "String-format".
> 
> Please, correct me if I'm wrong, but it makes more sense to me to use (with
> preserving all needed reading possibilities: by date, by hour, etc.) smth
> like Bytes.add(<time>, <smth>) as a key instead. Where <time> is byte[]
> representation of time (long). Advantages would be smaller key size (and
> since key is stored for each cell in HBase this means data amount
> reduction). Also I'd imagine that it could leave off conversion between
> sting/date/etc. representations.
> 
> Am I missing something?
> 
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
> 
> On Mon, Dec 6, 2010 at 7:27 PM, Todd Lipcon <[email protected]> wrote:
> 
>> Hi Peter,
>> 
>> You can set the start row to '20101201|14' and the end row to '20101201|15'
>> using the scanner API:
>> 
>> http://archive.cloudera.com/cdh/3/hbase/apidocs/org/apache/hadoop/hbase/client/Scan.html#setStartRow(byte[])<http://archive.cloudera.com/cdh/3/hbase/apidocs/org/apache/hadoop/hbase/client/Scan.html#setStartRow%28byte[]%29>
>> 
>> <
>> http://archive.cloudera.com/cdh/3/hbase/apidocs/org/apache/hadoop/hbase/client/Scan.html#setStartRow(byte[])<http://archive.cloudera.com/cdh/3/hbase/apidocs/org/apache/hadoop/hbase/client/Scan.html#setStartRow%28byte[]%29>
>>> 
>> Thanks
>> -Todd
>> 
>> On Mon, Dec 6, 2010 at 9:21 AM, Peter Haidinyak <[email protected]
>>> wrote:
>> 
>>> Hi,
>>> I have to enter log data into HBase. We will need to query the data by
>>> Date:Hour
>>> I am using the 'Date|Hour|Incrementing Counter' as the Row Id. Is there
>> an
>>> easy was to request the starting and stopping rows in a scan using some
>>> similar to 'like'?
>>> 
>>> Scan 'T1', {STARTROW=>'like 20101201|14'}
>>> 
>>> If not, what would be the best way to retrieve only one hour's worth of
>>> data? I am thinking of using another table to hold the incrementing count
>>> information for a Date|Hour and use that for Start/Stop.
>>> 
>>> Thanks
>>> 
>>> -Pete
>>> 
>> 
>> 
>> 
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>

Re: Best Practices Adding Rows

Reply via email to