As Stack and Andrew said, just wanted to give you fair warning that this mode may need some love. Likewise, there are probably alternative that run a bit lighter weight, though you flatter us with the reminder of the long feature list.
I have no problem with helping to fix and committing fixes to bugs that crop up in local mode operations. Bring 'em on! -n On Tue, Mar 10, 2015 at 3:56 PM, Alex Baranau <[email protected]> wrote: > On: > > - Future investment in a design that scales better > > Indeed, designing against key value store is different from designing > against RDBMs. > > I wonder if you explored an option to abstract the storage layer and using > "single node purposed" store until you grow enough to switch to another > one? > > E.g. you could use LevelDB [1] that is pretty fast (and there's java > rewrite of it, if you need java APIs [2]). We use it in CDAP [3] in a > standalone version to make the development environment (SDK) lighter. We > swap it with HBase in distributed mode without changing the application > code. It doesn't have coprocessors and other specific to HBase features you > are talking about, though. But you can figure out how to bridge client APIs > with an abstraction layer (e.g. we have common Table interface [4]). You > can even add versions on cells (see [5] for example of how we do it). > > Also, you could use RDBMs behind key-value abstraction, to start with, > while keeping your app design clean out of RDBMs specifics. > > Alex Baranau > > [1] https://github.com/google/leveldb > [2] https://github.com/dain/leveldb > [3] http://cdap.io > [4] > > https://github.com/caskdata/cdap/blob/develop/cdap-api/src/main/java/co/cask/cdap/api/dataset/table/Table.java > [5] > > https://github.com/caskdata/cdap/blob/develop/cdap-data-fabric/src/main/java/co/cask/cdap/data2/dataset2/lib/table/leveldb/LevelDBTableCore.java > > -- > http://cdap.io - open source framework to build and run data applications > on Hadoop & HBase > > On Tue, Mar 10, 2015 at 8:42 AM, Rose, Joseph < > [email protected]> wrote: > > > Sorry, never answered your question about versions. I have 1.0.0 version > > of hbase, which has hadoop-common 2.5.1 in its lib folder. > > > > > > -j > > > > > > On 3/10/15, 11:36 AM, "Rose, Joseph" <[email protected]> > > wrote: > > > > >I tried it and it does work now. It looks like the interface for > > >hadoop.fs.Syncable changed in March, 2012 to remove the deprecated > sync() > > >method and define only hsync() instead. The same committer did the right > > >thing and removed sync() from FSDataOutputStream at the same time. The > > >remaining hsync() method calls flush() if the underlying stream doesn't > > >implement Syncable. > > > > > > > > >-j > > > > > > > > >On 3/6/15, 5:24 PM, "Stack" <[email protected]> wrote: > > > > > >>On Fri, Mar 6, 2015 at 1:50 PM, Rose, Joseph < > > >>[email protected]> wrote: > > >> > > >>> I think the final issue with hadoop-common (re: unimplemented sync > for > > >>> local filesystems) is the one showstopper for us. We have to have > > >>>assured > > >>> durability. I¹m willing to devote some cycles to get it done, so > maybe > > >>>I¹m > > >>> the one that says this problem is worthwhile. > > >>> > > >>> > > >>I remember that was once the case but looking in codebase now, sync > calls > > >>through to ProtobufLogWriter which does a 'flush' on output (though > > >>comment > > >>says this is a noop). The output stream is an instance of > > >>FSDataOutputStream made with a RawLOS. The flush should come out here: > > >> > > >>220 public void flush() throws IOException { fos.flush(); } > > >> > > >>... where fos is an instance of FileOutputStream. > > >> > > >>In sync we go on to call hflush which looks like it calls flush again. > > >> > > >>What hadoop/hbase versions we talking about? HADOOP-8861 added the > above > > >>behavior for hadoop 1.2. > > >> > > >>Try it I'd say. > > >> > > >>St.Ack > > > > > > > >
