Hi Nicolas!

For the other adapters (Cassandra, Cassandra + Thrift, Cassandra + Astyanax, 
etc) they managed to run tests as Internal and External for unit tests and also 
have a profile for Performance and Concurrent tests, where External and 
Performance/Concurrent runs over a live database instance and only with 
Internal tests it is expected to start a database per test case, remaining the 
same tests as in External. HBase adapter already have External and 
Performance/Concurrent so I'm trying to provide the Internal set where the 
objective is to test Titan|HBase interaction. 

And my goal is to achieve better times than Cassandra :-)

Singleton seems to be a good option, but I have to check if Maven Surefire can 
keep same process between JUnit Test Cases. 

Because Titan work with adapters for different databases and manage table/CF 
creation when not exists, I think it will not be possible to prefix table names 
per test without changing some core components of Titan, and it seems to be too 
invasive to change this now, and deletion is fast enough so we can keep same 
table.

Thanks!!

Best regards,
Cristofer

-----Mensagem original-----
De: n keywal [mailto:[email protected]] 
Enviada em: sexta-feira, 31 de agosto de 2012 07:59
Para: [email protected]
Assunto: Re: HBase and unit tests

Hi Cristopher,

HBase starts a minicluster for many of its tests because we have a lot of 
destructive tests. Or the non destructive tests would be impacted by the 
destructive tests. When writing a client application, you usually don't need to 
do that: you can rely on the same instance for all your tests.

As well, it's useful to write the tests in a way compatible with a real cluster 
or a pseudo distributed one. Sometimes, when the test fails, you want to have a 
look at what the code wrote or found in HBase: you won't have this in a mini 
cluster. And it saves a start.

I don't know if there is a blog entry on this; but it's not very difficult to 
do (but as usual not that easy when you start). I've personally done it with a 
singleton class + prefixing the table names by a random key (to allow multiple 
tests in parallel on the same cluster without relying on
cleanup) + getProperty to decide between starting a mini cluster or connecting 
to a cluster.

HTH,

Nicolas


On Fri, Aug 31, 2012 at 12:28 PM, Cristofer Weber < 
[email protected]> wrote:

> Hi Sonal, Stack and Ulrich!
>
> Yes, I should provide more details :$
>
> I reached the links you provided when I was searching for a way to 
> start HBase with JUnit. From default, the only params I have changed 
> are Zookeeper port and the amount of nodes, which is 1 in my case. 
> Based on logs I suspect that most of time are spent with HDFS and 
> that's why I asked if there is a way to start a standalone instance of 
> HBase. The amount of data written at each test case would probably fit 
> in memstore anyway, and table cleansing between each test method is managed 
> by a loop of deletes.
>
> At least 15 seconds are spent on starting the mini cluster for each 
> test case.
>
> Right now I reminded that I should turn off WAL when running unit 
> tests :-), but this will not reflect on startup time.
>
> Thanks!!
>
> Best regards,
> Cristofer
>
> ________________________________________
> De: Ulrich Staudinger [[email protected]]
> Enviado: sexta-feira, 31 de agosto de 2012 2:21
> Para: [email protected]
> Assunto: Re: HBase and unit tests
>
> As a general advice, although you probably do take care of this, 
> instantiate the mini cluster only once in your junit test constructor 
> and not in every test method. at the end of each test, either cleanup 
> your hbase or use a different "area" per test.
>
> best regards,
> ulrich
>
>
> --
> connect on xing or linkedin. sent from my tablet.
>
> On 31.08.2012, at 06:46, Stack <[email protected]> wrote:
>
> > On Thu, Aug 30, 2012 at 4:44 PM, Cristofer Weber 
> > <[email protected]> wrote:
> >> Hi there!
> >>
> >> After I started studying HBase, I've searched for open source 
> >> projects
> backed by HBase and I found Titan distributed graph database (you 
> probably heard about it). As soon as I read in their documentation 
> that HBase adapter is experimental and suboptimal (disclaimer here:
> https://github.com/thinkaurelius/titan/wiki/Using-HBase) I volunteered 
> to help improving this adapter and since then I made a few changes to 
> improve on running tests (reduced from hours to minutes) and also an 
> improvement on search feature.
> >>
> >> Now I'm trying to break the dependency on a pre-installed HBase for
> unit tests and found miniCluster inside HBase tests, but minicluster 
> demands too much time to start and I don't know if tweaking on configs 
> will improve significantly. Is there a way to start a 'lightweight' 
> instance, like programatically starting a standalone instance?
> >>
> >
> > How much is 'too much time' Cristofer?  Do you want a standalone 
> > cluster
> at all?
> > St.Ack
> > P.S. If digging in this area, you might find the blog post by the 
> > sematextians of use:
> >
> http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestin
> gutility-for-local-testing-development/
>

Reply via email to