On:

- Future investment in a design that scales better

Indeed, designing against key value store is different from designing
against RDBMs.

I wonder if you explored an option to abstract the storage layer and using
"single node purposed" store until you grow enough to switch to another one?

E.g. you could use LevelDB [1] that is pretty fast (and there's java
rewrite of it, if you need java APIs [2]). We use it in CDAP [3] in a
standalone version to make the development environment (SDK) lighter. We
swap it with HBase in distributed mode without changing the application
code. It doesn't have coprocessors and other specific to HBase features you
are talking about, though. But you can figure out how to bridge client APIs
with an abstraction layer (e.g. we have common Table interface [4]). You
can even add versions on cells (see [5] for example of how we do it).

Also, you could use RDBMs behind key-value abstraction, to start with,
while keeping your app design clean out of RDBMs specifics.

Alex Baranau

[1] https://github.com/google/leveldb
[2] https://github.com/dain/leveldb
[3] http://cdap.io
[4]
https://github.com/caskdata/cdap/blob/develop/cdap-api/src/main/java/co/cask/cdap/api/dataset/table/Table.java
[5]
https://github.com/caskdata/cdap/blob/develop/cdap-data-fabric/src/main/java/co/cask/cdap/data2/dataset2/lib/table/leveldb/LevelDBTableCore.java

--
http://cdap.io - open source framework to build and run data applications
on Hadoop & HBase

On Tue, Mar 10, 2015 at 8:42 AM, Rose, Joseph <
joseph.r...@childrens.harvard.edu> wrote:

> Sorry, never answered your question about versions. I have 1.0.0 version
> of hbase, which has hadoop-common 2.5.1 in its lib folder.
>
>
> -j
>
>
> On 3/10/15, 11:36 AM, "Rose, Joseph" <joseph.r...@childrens.harvard.edu>
> wrote:
>
> >I tried it and it does work now. It looks like the interface for
> >hadoop.fs.Syncable changed in March, 2012 to remove the deprecated sync()
> >method and define only hsync() instead. The same committer did the right
> >thing and removed sync() from FSDataOutputStream at the same time. The
> >remaining hsync() method calls flush() if the underlying stream doesn't
> >implement Syncable.
> >
> >
> >-j
> >
> >
> >On 3/6/15, 5:24 PM, "Stack" <st...@duboce.net> wrote:
> >
> >>On Fri, Mar 6, 2015 at 1:50 PM, Rose, Joseph <
> >>joseph.r...@childrens.harvard.edu> wrote:
> >>
> >>> I think the final issue with hadoop-common (re: unimplemented sync for
> >>> local filesystems) is the one showstopper for us. We have to have
> >>>assured
> >>> durability. I¹m willing to devote some cycles to get it done, so maybe
> >>>I¹m
> >>> the one that says this problem is worthwhile.
> >>>
> >>>
> >>I remember that was once the case but looking in codebase now, sync calls
> >>through to ProtobufLogWriter which does a 'flush' on output (though
> >>comment
> >>says this is a noop). The output stream is an instance of
> >>FSDataOutputStream made with a RawLOS. The flush should come out here:
> >>
> >>220     public void flush() throws IOException { fos.flush(); }
> >>
> >>... where fos is an instance of FileOutputStream.
> >>
> >>In sync we go on to call hflush which looks like it calls flush again.
> >>
> >>What hadoop/hbase versions we talking about? HADOOP-8861 added the above
> >>behavior for hadoop 1.2.
> >>
> >>Try it I'd say.
> >>
> >>St.Ack
> >
>
>

Reply via email to