Re: sorting by value

2012-08-31 Thread Tom Brown
We do numerical sorting within some of our tables. We put the numerical values as fixed length byte arrays within the keys (and flipped the sign bit so negative values are lexigraphically lower than positive values) Of course, it's still part of the key so that technique doesn't work for

Re: md5 hash key and splits

2012-08-31 Thread Stack
On Thu, Aug 30, 2012 at 5:04 PM, Mohit Anchlia mohitanch...@gmail.com wrote: In general isn't it better to split the regions so that the load can be spread accross the cluster to avoid HotSpots? Time series data is a particular case [1] and the sematextians have tools to help w/ that

Re: sorting by value

2012-08-31 Thread Pamecha, Abhishek
Thanks St.Ack and Tom. Yes I too kinda came up with a similar scheme -- To store the rank as part of the key. Where it broke down for me was for say, k-dimensional data where ranks are stored for dimension A but the query requires sorting by dimension b. For now I have to settle with

Re: sorting by value

2012-08-31 Thread Pamecha, Abhishek
Btw, liked the bit flipping for negative values. It didn't occur to me right off, it would be a problem i Sent from my iPad with iMstakes On Aug 30, 2012, at 23:14, Tom Brown tombrow...@gmail.com wrote: We do numerical sorting within some of our tables. We put the numerical values as

RES: HBase and unit tests

2012-08-31 Thread Cristofer Weber
Hi Sonal, Stack and Ulrich! Yes, I should provide more details :$ I reached the links you provided when I was searching for a way to start HBase with JUnit. From default, the only params I have changed are Zookeeper port and the amount of nodes, which is 1 in my case. Based on logs I suspect

Re: HBase and unit tests

2012-08-31 Thread n keywal
Hi Cristopher, HBase starts a minicluster for many of its tests because we have a lot of destructive tests. Or the non destructive tests would be impacted by the destructive tests. When writing a client application, you usually don't need to do that: you can rely on the same instance for all your

Re: HBase and unit tests

2012-08-31 Thread Ulrich Staudinger
Hi Cristofer, At least 15 seconds are spent on starting the mini cluster for each test case. and you are sure that you are reusing your mini cluster across unit tests? HTH2, Ulrich On Fri, Aug 31, 2012 at 12:28 PM, Cristofer Weber cristofer.we...@neogrid.com wrote: Hi Sonal, Stack and

RES: HBase and unit tests

2012-08-31 Thread Cristofer Weber
Hi Ulrich, Yes, I'm starting mini cluster inside @BeforeClass. There are 3 different test cases, and between 2 and 15 tests per test case. Thanks! Best regards, Cristofer -Mensagem original- De: ustaudin...@gmail.com [mailto:ustaudin...@gmail.com] Em nome de Ulrich Staudinger Enviada

Re: HBase and unit tests

2012-08-31 Thread n keywal
On Fri, Aug 31, 2012 at 2:33 PM, Cristofer Weber cristofer.we...@neogrid.com wrote: For the other adapters (Cassandra, Cassandra + Thrift, Cassandra + Astyanax, etc) they managed to run tests as Internal and External for unit tests and also have a profile for Performance and Concurrent tests,

Re: md5 hash key and splits

2012-08-31 Thread Doug Meil
Stack, re: Where did you read that?, I think he might also be referring to this... http://hbase.apache.org/book.html#important_configurations On 8/30/12 8:04 PM, Mohit Anchlia mohitanch...@gmail.com wrote: In general isn't it better to split the regions so that the load can be spread

Re: [maybe off-topic?] article: Solving Big Data Challenges for Enterprise Application Performance Management

2012-08-31 Thread Andrew Purtell
Asynchbase redone with PB and attention to security would be a good place to start. I can't commit resources in the immediate term, so that's easy for me to say I know. Anyway seems we're on the same page wrt client. On Friday, August 31, 2012, lars hofhansl wrote: Many of us have been saying

Re: md5 hash key and splits

2012-08-31 Thread Mohit Anchlia
On Thu, Aug 30, 2012 at 11:52 PM, Stack st...@duboce.net wrote: On Thu, Aug 30, 2012 at 5:04 PM, Mohit Anchlia mohitanch...@gmail.com wrote: In general isn't it better to split the regions so that the load can be spread accross the cluster to avoid HotSpots? Time series data is a

Re: md5 hash key and splits

2012-08-31 Thread Stack
On Fri, Aug 31, 2012 at 6:09 AM, Doug Meil doug.m...@explorysmedical.com wrote: Stack, re: Where did you read that?, I think he might also be referring to this... http://hbase.apache.org/book.html#important_configurations I'd say we need to revist that paragraph. It gives a 'wrong'

Re: md5 hash key and splits

2012-08-31 Thread Stack
On Fri, Aug 31, 2012 at 7:55 AM, Mohit Anchlia mohitanch...@gmail.com wrote: My data is timeseries and to get random distribution and still have the keys in the same region for a user I am thinking of using md5(userid)+reversetimestamp as a row key. But with this type of key how can one do

Re: Allocating more heap for endpoint coprocessors

2012-08-31 Thread Gary Helmling
Maybe we need to add a coprocessors section to the ref guide. I think all the current documentation is in javadoc. And if all the potentially destabilizing issues of in-process coprocessor usage are not yet called out (memory usage, cpu, etc), we could more explicitly detail that. In we want to

Re: Can I use coprocessor to record the deleted data caused by ttl?

2012-08-31 Thread lars hofhansl
Yes (in 0.94.2+). But it would be quite tricky. You'd have to hook into the compaction. There's a new hook now in RegionObserver (preCompactionScannerOpen, and preFlushScannerOpen). See HBASE-6427. These two hooks are passed the scanners that provide the set of KVs to be compacted. You could