Hi Alex, That is indeed the recommended way, i.e. use binary values if you can. As long as you can express the same sorting as a long as opposed to a string then that's the way to go for sure.
Lars On Dec 7, 2010, at 8:21, Alex Baranau <[email protected]> wrote: > I think I've faced by the key format, smth like "<date><hour><smth>" several > times in the list recently. Which I assume is a "String-format". > > Please, correct me if I'm wrong, but it makes more sense to me to use (with > preserving all needed reading possibilities: by date, by hour, etc.) smth > like Bytes.add(<time>, <smth>) as a key instead. Where <time> is byte[] > representation of time (long). Advantages would be smaller key size (and > since key is stored for each cell in HBase this means data amount > reduction). Also I'd imagine that it could leave off conversion between > sting/date/etc. representations. > > Am I missing something? > > Alex Baranau > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase > > On Mon, Dec 6, 2010 at 7:27 PM, Todd Lipcon <[email protected]> wrote: > >> Hi Peter, >> >> You can set the start row to '20101201|14' and the end row to '20101201|15' >> using the scanner API: >> >> http://archive.cloudera.com/cdh/3/hbase/apidocs/org/apache/hadoop/hbase/client/Scan.html#setStartRow(byte[])<http://archive.cloudera.com/cdh/3/hbase/apidocs/org/apache/hadoop/hbase/client/Scan.html#setStartRow%28byte[]%29> >> >> < >> http://archive.cloudera.com/cdh/3/hbase/apidocs/org/apache/hadoop/hbase/client/Scan.html#setStartRow(byte[])<http://archive.cloudera.com/cdh/3/hbase/apidocs/org/apache/hadoop/hbase/client/Scan.html#setStartRow%28byte[]%29> >>> >> Thanks >> -Todd >> >> On Mon, Dec 6, 2010 at 9:21 AM, Peter Haidinyak <[email protected] >>> wrote: >> >>> Hi, >>> I have to enter log data into HBase. We will need to query the data by >>> Date:Hour >>> I am using the 'Date|Hour|Incrementing Counter' as the Row Id. Is there >> an >>> easy was to request the starting and stopping rows in a scan using some >>> similar to 'like'? >>> >>> Scan 'T1', {STARTROW=>'like 20101201|14'} >>> >>> If not, what would be the best way to retrieve only one hour's worth of >>> data? I am thinking of using another table to hold the incrementing count >>> information for a Date|Hour and use that for Start/Stop. >>> >>> Thanks >>> >>> -Pete >>> >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >>
