Re: Using HBase in combination with HDFS directly

Peter Veentjer Wed, 05 Jan 2011 08:23:06 -0800

I just replaced the native filesystem based solution by HDFS without
introducing any additional servers, And it works perfectly in combination
with encryption of files. For the POC this is sufficient.


I think I have spend more time on typing emails today then on switching to
HDFS.

Thanks!

On Wed, Jan 5, 2011 at 5:06 PM, Peter Veentjer <[email protected]>wrote:

>
>
> On Wed, Jan 5, 2011 at 5:00 PM, Friso van Vollenhoven <
> [email protected]> wrote:
>
>> I guess so.
>>
>> HBase actually has quite a strong consistency model.
>
>
> It depends on how consistency is defined. HBase supports no repeatable
> reads because there is no concept of transaction, so every time you do a
> read you get a different result. For STM this would be called extremely low
> consistency. There are higher levels of consistency like 'snapshot'
> consistency where your reads are not only repeatable but also are causal
> consistent. And then of course there is the serialized isolation level where
> even writeskews are prevented.
>
>
>> Thing is, that it is just row level. Multi row transactions would require
>> multiple locks and some kind of commit / roll back solution. Have you had a
>> look at Google's percolator paper?
>>
>
> Not yet. I'll check it our.
>
>
>>
>>
>> Friso
>>
>>
>>
>> On 5 jan 2011, at 16:49, Peter Veentjer wrote:
>>
>> > I also want to see if an STM like Multiverse can be aligned with NoSQL
>> > solutions like HBase. But to do that, I first need to get more hands on
>> > experience with NoSQL solutions.
>> >
>> > On Wed, Jan 5, 2011 at 4:34 PM, Peter Veentjer <[email protected]
>> >wrote:
>> >
>> >>
>> >>
>> >> On Wed, Jan 5, 2011 at 4:03 PM, Friso van Vollenhoven <
>> >> [email protected]> wrote:
>> >>
>> >>> Hi Peter,
>> >>>
>> >>> Do you mean you want to use the HDFS that HBase relies on for other
>> things
>> >>> and not just exclusively HBase? That should be just fine. We do it all
>> the
>> >>> time.
>> >>>
>> >>>
>> >> Ok thanks.
>> >>
>> >>
>> >>
>> >>> Are you worried about putting to much load on it?
>> >>
>> >>
>> >> For the POC it won't matter that much. I can get my stuff up and
>> running.
>> >>
>> >>
>> >>> I guess that depends on the type of work load that you have and what
>> you
>> >>> do with it. But generally I think it is nice to have all nodes be the
>> same
>> >>> (so all workers are datanode and region server), such that you don't
>> have to
>> >>> scale out them separately.
>> >>>
>> >>
>> >>>> Peter, are you based in The Netherlands by any chance? There is a
>> NoSQL
>> >> meetup group in NL (http://www.meetup.com/nosql-nl/) with >>meetups
>> every
>> >> now and then. Next one is at January 24 and is all about HBase. We're
>> doing
>> >> a on the spot install on a number of present >>laptops to create a
>> temporary
>> >> cluster and play around with it. I have been working with Hadoop and
>> HBase
>> >> for the past couple of months, so if >>you care to come by, I'd be
>> happy to
>> >> share some experiences.
>> >>
>> >> Yet I live in Holland. I'm a former Xebia employee :) I think I'll
>> visit
>> >> one of the nosql meetups.
>> >>
>> >> We are building a kind of application server where instead of providing
>> >> services like JMS, Servlet, EJB's etc we are providing services for
>> secured
>> >> document storage, message exchange, semantic analysis of documents etc.
>> It
>> >> is all based on GigaSpaces but I have the impression (after working
>> more
>> >> than a year with it) that is is very time consuming to get right. Apart
>> from
>> >> all the correctness issues (and there where/are many.. based on bad
>> usage of
>> >> GigaSpaces and architectural choices) there are also some
>> >> performance/scalability issues that need solving.
>> >>
>> >> So I decided to rewrite the main use cases using HBase. I had most of
>> the
>> >> functionality up and running in a few days and most of the 'bad
>> >> architectural choices' we are going to remove in the next 6 months are
>> not
>> >> there from the beginning (e.g. using streams instead of byte arrays for
>> >> document processing.. how stupid can you be). It also was a nice
>> exercise to
>> >> play with HBase and less consistent solutions.
>> >>
>> >> I normally work on realizing very high consistency for Multiverse:
>> >>
>> >> http://multiverse.codehaus.org
>> >>
>> >> So I want to have some hands on experience with using less consistent
>> >> solutions.
>> >>
>> >>
>> >>>
>> >>> Friso
>> >>>
>> >>>
>> >>>
>> >>> On 5 jan 2011, at 14:41, Peter Veentjer wrote:
>> >>>
>> >>>> Hi Guys,
>> >>>>
>> >>>> I'm currently writing a POC based on hbase and I spend more time on
>> >>> writing
>> >>>> a ui than on writing the hbase functionality. So I'm very excited
>> about
>> >>>> exploring HBase further and doing some serious performance and
>> >>> scalability
>> >>>> tests and see if we can use it as core technology instead of the
>> >>>> time/resource intensive Gigaspaces.
>> >>>>
>> >>>> My question:
>> >>>>
>> >>>> I'm currently using HBase and I also want to use the HDFS directly to
>> >>> store
>> >>>> files. If the HBase server(s) is installed, can I directly access the
>> >>> HDFS
>> >>>> of these servers or is it better to set up a seperate Hadoop server
>> for
>> >>>> running HDFS.
>> >>>
>> >>>
>> >>
>>
>>
>

Re: Using HBase in combination with HDFS directly

Reply via email to