Re: Standalone == Dev Only?

Michael Segel Sun, 08 Mar 2015 07:27:28 -0700

You’re dealing with patient data which is either very structured or 
semi-structured where you can use an RDBMs if you really think about your 
schema.


If you want an RDBMs that can be used to hold objects, look at Informix’s IDS 
which is now IBM’s IDS. It contains the extensibility that you could store 
objects using AVRO if someone at IBM had built datablades.  (And yes you can do 
‘cell’ aka field level encryption too.) 

In terms of growth, how much data? 100TB or more? That’s seems to be the limit 
of systems like Oracle’s Exadata and Vertica, Informix does allow for Federated 
queries, however, again YMMV depending on your schema. 

Then there’s options like HBase. 
But even for small amounts of data, you’re going to need a cluster of about 5 
datanodes for RS. And you need to also consider what you are storing and how 
you are storing the data.  Security is another issue. And trust me, depending 
on your requirements. It can be a real bitch. 


There are other issues too, like stability. Depending on your admin’s skill, 
your schema and use case… YMMV. 

But if you have your heart set on HBase, also consider Splice Machines’ add-on 
which gives some relational power to HBase. 

Not all RDBMS are created equal. 
Not all No-SQL databases are created equal. 

If you need certain security, think about Accumulo, or look at MapR’s releases. 
M3 is free and if you want something that will scale and perform… MapR’s M7 aka 
MapRDB.

And until any of the Apache vendors pick a hardware vendor, MapR is the only 
one who has a TPC.org <http://tpc.org/> benchmark under their belt and since 
they were the first, they set the bar that others have to beat.


> On Mar 6, 2015, at 4:21 PM, Stack <[email protected]> wrote:
> 
> On Fri, Mar 6, 2015 at 1:50 PM, Rose, Joseph <
> [email protected]> wrote:
> 
>> So, I think Nick, St.Ack and Wilm have all made some excellent points, but
>> this last email more or less hit it on the head. Like I said, I¹m working
>> with patient data and while the volume is small now, it¹s not going to
>> stay that way. And the cell-level security is a *huge* win ‹ I¹m sure you
>> folks have some idea how happy that feature makes me. I¹d also rather be
>> writing coprocessors than triggers or ‹ heaven forbid ‹ PL/SQL.
>> 
>> But there¹s another, more fundamental thing: we¹re exploring other DB
>> architectures because classical RDBMS systems haven¹t always worked out so
>> well. In fact, we¹re having a bit of a hard time with the current project
>> because we¹ve been constrained (thus far) to a relational system and it
>> doesn¹t seem to be a clean fit. A key/val store, on the other hand, will
>> have enough flexibility to get the job done, I think. It¹s all being
>> prototyped now, so we¹ll see.
>> 
>> 
> Ok. Sounds like you know the +/-s. Was just checking.
> 
> 
> 
>> I think the final issue with hadoop-common (re: unimplemented sync for
>> local filesystems) is the one showstopper for us. We have to have assured
>> durability. I¹m willing to devote some cycles to get it done, so maybe I¹m
>> the one that says this problem is worthwhile.
>> 
>> 
> I remember that was once the case but looking in codebase now, sync calls
> through to ProtobufLogWriter which does a 'flush' on output (though comment
> says this is a noop). The output stream is an instance of
> FSDataOutputStream made with a RawLOS. The flush should come out here:
> 
> 220     public void flush() throws IOException { fos.flush(); }
> 
> ... where fos is an instance of FileOutputStream.
> 
> In sync we go on to call hflush which looks like it calls flush again.
> 
> What hadoop/hbase versions we talking about? HADOOP-8861 added the above
> behavior for hadoop 1.2.
> 
> Try it I'd say.
> 
> St.Ack
> 
> 
> 
> 
> 
>> Thanks for chiming in. I¹d love to hear more.
>> 
>> 
>> -j
>> 
>> 
>> On 3/6/15, 3:02 PM, "Wilm Schumacher" <[email protected]> wrote:
>> 
>>> Hi,
>>> 
>>> Am 06.03.2015 um 19:18 schrieb Stack:
>>>> Why not use an RDBMS then?
>>> 
>>> When I first read the hbase documentation I also stumbled about the
>>> "only use for large datasets" or "standalone only in dev mode" etc. In
>>> my point of view there are some arguments against RDBMSs and for e.g.
>>> hbase, although we talk about a single node application.
>>> 
>>> * scalability is a future investment. Even if the dataset is small now,
>>> it doesn't mean that it is in the future, too. Scalabilty in size and
>>> computing power is always a good idea.
>>> 
>>> * query language: for a user hbase is more of a database library than a
>>> "DBMS". For me this is a big plus, as it forces the user to do it the
>>> right way. Just think of SQL-injection. Or CQL-injection for that
>>> matter. Query languages are like scripting languages. Makes easy stuff
>>> easier and hard stuff harder.
>>> 
>>> * fancy features: hbase has fancy features RDBMSs doesn't have. E.g.
>>> coprocessors. I know that e.g. mysql has "triggers", but they are not
>>> nearly as powerful as coprocessors. And don't forget that you have to
>>> write most of the triggers in this *curse word* SQ-language if you don't
>>> want to use evil hacks.
>>> 
>>> * schema-less: another HUGE plus is the possibility to use it without a
>>> fixed schema. In SQL you would need several tables and do a lot of
>>> joins. And the output is way harder to get and to parse.
>>> 
>>> * ecosystem: when you use hbase you automatically get the whole hadoop,
>>> or better apache foundation, ecosystem right away. Not only hdfs, but
>>> mapred, lucene, spark, kafka etc. etc..
>>> 
>>> There are only two real arguments against hbase in that scenario:
>>> 
>>> * joins etc.: well, in sql that's a question of minutes. In hbase that
>>> takes a little more effort. BUT: then it's done the right way ;).
>>> 
>>> * RDMSs are more widely known: well ... that's not the fault of hbase ;).
>>> 
>>> Thus, I think that the hbase community should be more self-reliant for
>>> that matter, even and especially for applications in the SQL realm ;).
>>> Which is a good opportunity to say congratulations for the hbase 1.0
>>> milestone. And thank you for that.
>>> 
>>> Best wishes
>>> 
>>> Wilm
>>> 
>> 
>> 




The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Standalone == Dev Only?

Reply via email to