Re: Standalone == Dev Only?

Michael Segel Fri, 13 Mar 2015 10:48:20 -0700

Guys, 

More than just needing some love. 
No HDFS… means data at risk. 
No HDFS… means that stand alone will have security issues.


Patient Data? HINT: HIPPA.

Please think your design through and if you go w HBase… you will want to build 
out a small cluster. 

> On Mar 10, 2015, at 6:16 PM, Nick Dimiduk <[email protected]> wrote:
> 
> As Stack and Andrew said, just wanted to give you fair warning that this
> mode may need some love. Likewise, there are probably alternative that run
> a bit lighter weight, though you flatter us with the reminder of the long
> feature list.
> 
> I have no problem with helping to fix and committing fixes to bugs that
> crop up in local mode operations. Bring 'em on!
> 
> -n
> 
> On Tue, Mar 10, 2015 at 3:56 PM, Alex Baranau <[email protected]>
> wrote:
> 
>> On:
>> 
>> - Future investment in a design that scales better
>> 
>> Indeed, designing against key value store is different from designing
>> against RDBMs.
>> 
>> I wonder if you explored an option to abstract the storage layer and using
>> "single node purposed" store until you grow enough to switch to another
>> one?
>> 
>> E.g. you could use LevelDB [1] that is pretty fast (and there's java
>> rewrite of it, if you need java APIs [2]). We use it in CDAP [3] in a
>> standalone version to make the development environment (SDK) lighter. We
>> swap it with HBase in distributed mode without changing the application
>> code. It doesn't have coprocessors and other specific to HBase features you
>> are talking about, though. But you can figure out how to bridge client APIs
>> with an abstraction layer (e.g. we have common Table interface [4]). You
>> can even add versions on cells (see [5] for example of how we do it).
>> 
>> Also, you could use RDBMs behind key-value abstraction, to start with,
>> while keeping your app design clean out of RDBMs specifics.
>> 
>> Alex Baranau
>> 
>> [1] https://github.com/google/leveldb
>> [2] https://github.com/dain/leveldb
>> [3] http://cdap.io
>> [4]
>> 
>> https://github.com/caskdata/cdap/blob/develop/cdap-api/src/main/java/co/cask/cdap/api/dataset/table/Table.java
>> [5]
>> 
>> https://github.com/caskdata/cdap/blob/develop/cdap-data-fabric/src/main/java/co/cask/cdap/data2/dataset2/lib/table/leveldb/LevelDBTableCore.java
>> 
>> --
>> http://cdap.io - open source framework to build and run data applications
>> on Hadoop & HBase
>> 
>> On Tue, Mar 10, 2015 at 8:42 AM, Rose, Joseph <
>> [email protected]> wrote:
>> 
>>> Sorry, never answered your question about versions. I have 1.0.0 version
>>> of hbase, which has hadoop-common 2.5.1 in its lib folder.
>>> 
>>> 
>>> -j
>>> 
>>> 
>>> On 3/10/15, 11:36 AM, "Rose, Joseph" <[email protected]>
>>> wrote:
>>> 
>>>> I tried it and it does work now. It looks like the interface for
>>>> hadoop.fs.Syncable changed in March, 2012 to remove the deprecated
>> sync()
>>>> method and define only hsync() instead. The same committer did the right
>>>> thing and removed sync() from FSDataOutputStream at the same time. The
>>>> remaining hsync() method calls flush() if the underlying stream doesn't
>>>> implement Syncable.
>>>> 
>>>> 
>>>> -j
>>>> 
>>>> 
>>>> On 3/6/15, 5:24 PM, "Stack" <[email protected]> wrote:
>>>> 
>>>>> On Fri, Mar 6, 2015 at 1:50 PM, Rose, Joseph <
>>>>> [email protected]> wrote:
>>>>> 
>>>>>> I think the final issue with hadoop-common (re: unimplemented sync
>> for
>>>>>> local filesystems) is the one showstopper for us. We have to have
>>>>>> assured
>>>>>> durability. I¹m willing to devote some cycles to get it done, so
>> maybe
>>>>>> I¹m
>>>>>> the one that says this problem is worthwhile.
>>>>>> 
>>>>>> 
>>>>> I remember that was once the case but looking in codebase now, sync
>> calls
>>>>> through to ProtobufLogWriter which does a 'flush' on output (though
>>>>> comment
>>>>> says this is a noop). The output stream is an instance of
>>>>> FSDataOutputStream made with a RawLOS. The flush should come out here:
>>>>> 
>>>>> 220     public void flush() throws IOException { fos.flush(); }
>>>>> 
>>>>> ... where fos is an instance of FileOutputStream.
>>>>> 
>>>>> In sync we go on to call hflush which looks like it calls flush again.
>>>>> 
>>>>> What hadoop/hbase versions we talking about? HADOOP-8861 added the
>> above
>>>>> behavior for hadoop 1.2.
>>>>> 
>>>>> Try it I'd say.
>>>>> 
>>>>> St.Ack
>>>> 
>>> 
>>> 
>> 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Standalone == Dev Only?

Reply via email to