At a physical level HBase is append-only.

At a logical level, one can update data in HBase just like one can in any RDBMS.

The memstore/block cache and compaction logic are the mechanisms that bridge 
between these two views.

What makes LSMs attractive performance-wise in comparison to traditional RDMS 
storage architectures is that memory speeds and CPU speeds have increased at a 
faster rate than Disk I/O transfer speeds.

Even in traditional RDBMS though it is useful to periodically perform file 
reorganizations, that is, rewrite scattered disk blocks into sequence on disk. 
Many RDBMSs do this; Tandem did it way back in the 1980s for example. But 
caches were not large enough to have an LSM-style architecture back then.

Dave

-----Original Message-----
From: Mich Talebzadeh [mailto:[email protected]] 
Sent: Friday, October 21, 2016 2:09 PM
To: [email protected]
Subject: Re: Hbase fast access

I was asked an interesting question.

Can one update data in Hbase? and my answer was it is only append only

Can one update data in Hive? My answer was yes if table is created as ORC and 
tableproperties set with "transactional"="true"


STORED AS ORC
TBLPROPERTIES ( "orc.compress"="SNAPPY", "transactional"="true", 
"orc.create.index"="true", "orc.bloom.filter.columns"="object_id",
"orc.bloom.filter.fpp"="0.05",
"orc.stripe.size"="268435456",
"orc.row.index.stride"="10000" )




Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from such 
loss, damage or destruction.



On 21 October 2016 at 22:01, Ted Yu <[email protected]> wrote:

> It is true in the sense that hfile, once written (and closed), becomes 
> immutable.
>
> Compaction would remove obsolete content and generate new hfiles.
>
> Cheers
>
> On Fri, Oct 21, 2016 at 1:59 PM, Mich Talebzadeh < 
> [email protected]>
> wrote:
>
> > BTW. I always understood that Hbase is append only. is that 
> > generally
> true?
> >
> > thx
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6Ac
> > PCCd
> > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility 
> > for any loss, damage or destruction of data or any other property 
> > which may arise from relying on this email's technical content is 
> > explicitly disclaimed.
> > The author will in no case be liable for any monetary damages 
> > arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 21 October 2016 at 21:57, Mich Talebzadeh 
> > <[email protected]>
> > wrote:
> >
> > > agreed much like any rdbms
> > >
> > >
> > >
> > > Dr Mich Talebzadeh
> > >
> > >
> > >
> > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > <https://www.linkedin.com/profile/view?id=
> AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> > >
> > >
> > >
> > > http://talebzadehmich.wordpress.com
> > >
> > >
> > > *Disclaimer:* Use it at your own risk. Any and all responsibility 
> > > for
> any
> > > loss, damage or destruction of data or any other property which 
> > > may
> arise
> > > from relying on this email's technical content is explicitly
> disclaimed.
> > > The author will in no case be liable for any monetary damages 
> > > arising
> > from
> > > such loss, damage or destruction.
> > >
> > >
> > >
> > > On 21 October 2016 at 21:54, Ted Yu <[email protected]> wrote:
> > >
> > >> Well, updates (in memory) would ultimately be flushed to disk,
> resulting
> > >> in
> > >> new hfiles.
> > >>
> > >> On Fri, Oct 21, 2016 at 1:50 PM, Mich Talebzadeh < 
> > >> [email protected]>
> > >> wrote:
> > >>
> > >> > thanks
> > >> >
> > >> > bq. all updates are done in memory o disk access
> > >> >
> > >> > I meant data updates are operated in memory, no disk access.
> > >> >
> > >> > in other much like rdbms read data into memory and update it 
> > >> > there (assuming that data is not already in memory?)
> > >> >
> > >> > HTH
> > >> >
> > >> > Dr Mich Talebzadeh
> > >> >
> > >> >
> > >> >
> > >> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > >> > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > >> > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrb
> > >> Jd6zP6AcPCCd
> > >> > OABUrV8Pw>*
> > >> >
> > >> >
> > >> >
> > >> > http://talebzadehmich.wordpress.com
> > >> >
> > >> >
> > >> > *Disclaimer:* Use it at your own risk. Any and all 
> > >> > responsibility
> for
> > >> any
> > >> > loss, damage or destruction of data or any other property which 
> > >> > may
> > >> arise
> > >> > from relying on this email's technical content is explicitly
> > disclaimed.
> > >> > The author will in no case be liable for any monetary damages
> arising
> > >> from
> > >> > such loss, damage or destruction.
> > >> >
> > >> >
> > >> >
> > >> > On 21 October 2016 at 21:46, Ted Yu <[email protected]> wrote:
> > >> >
> > >> > > bq. this search is carried out through map-reduce on region
> servers?
> > >> > >
> > >> > > No map-reduce. region server uses its own thread(s).
> > >> > >
> > >> > > bq. all updates are done in memory o disk access
> > >> > >
> > >> > > Can you clarify ? There seems to be some missing letters.
> > >> > >
> > >> > > On Fri, Oct 21, 2016 at 1:43 PM, Mich Talebzadeh < 
> > >> > > [email protected]>
> > >> > > wrote:
> > >> > >
> > >> > > > thanks
> > >> > > >
> > >> > > > having read the docs it appears to me that the main reason 
> > >> > > > of
> > hbase
> > >> > being
> > >> > > > faster is:
> > >> > > >
> > >> > > >
> > >> > > >    1. it behaves like an rdbms like oracle tetc. reads are
> looked
> > >> for
> > >> > in
> > >> > > >    the buffer cache for consistent reads and if not found 
> > >> > > > then
> > store
> > >> > > files
> > >> > > > on
> > >> > > >    disks are searched. Does this mean that this search is
> carried
> > >> out
> > >> > > > through
> > >> > > >    map-reduce on region servers?
> > >> > > >    2. when the data is written it is written to log file
> > >> sequentially
> > >> > > >    first, then to in-memory store, sorted like b-tree of 
> > >> > > > rdbms
> and
> > >> then
> > >> > > >    flushed to disk. this is exactly what checkpoint in an 
> > >> > > > rdbms
> > does
> > >> > > >    3. one can point out that hbase is faster because log
> > structured
> > >> > merge
> > >> > > >    tree (LSM-trees)  has less depth than a B-tree in rdbms.
> > >> > > >    4. all updates are done in memory o disk access
> > >> > > >    5. in summary LSM-trees reduce disk access when data is 
> > >> > > > read
> > from
> > >> > disk
> > >> > > >    because of reduced seek time again less depth to get 
> > >> > > > data
> with
> > >> > > LSM-tree
> > >> > > >
> > >> > > >
> > >> > > > appreciate any comments
> > >> > > >
> > >> > > >
> > >> > > > cheers
> > >> > > >
> > >> > > >
> > >> > > > Dr Mich Talebzadeh
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > >> > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > >> > > > <https://www.linkedin.com/profile/view?id=
> > >> > AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> > >> > > > OABUrV8Pw>*
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > http://talebzadehmich.wordpress.com
> > >> > > >
> > >> > > >
> > >> > > > *Disclaimer:* Use it at your own risk. Any and all
> responsibility
> > >> for
> > >> > any
> > >> > > > loss, damage or destruction of data or any other property 
> > >> > > > which
> > may
> > >> > arise
> > >> > > > from relying on this email's technical content is 
> > >> > > > explicitly
> > >> > disclaimed.
> > >> > > > The author will in no case be liable for any monetary 
> > >> > > > damages
> > >> arising
> > >> > > from
> > >> > > > such loss, damage or destruction.
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On 21 October 2016 at 17:51, Ted Yu <[email protected]>
> wrote:
> > >> > > >
> > >> > > > > See some prior blog:
> > >> > > > >
> > >> > > > > http://www.cyanny.com/2014/03/13/hbase-architecture-
> > >> > > > > analysis-part1-logical-architecture/
> > >> > > > >
> > >> > > > > w.r.t. compaction in Hive, it is used to compact deltas 
> > >> > > > > into a
> > >> base
> > >> > > file
> > >> > > > > (in the context of transactions).  Likely they're different.
> > >> > > > >
> > >> > > > > Cheers
> > >> > > > >
> > >> > > > > On Fri, Oct 21, 2016 at 9:08 AM, Mich Talebzadeh < 
> > >> > > > > [email protected]>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Hi,
> > >> > > > > >
> > >> > > > > > Can someone in a nutshell explain *the *Hbase use of
> > >> log-structured
> > >> > > > > > merge-tree (LSM-tree) as data storage architecture
> > >> > > > > >
> > >> > > > > > The idea of merging smaller files to larger files
> periodically
> > >> to
> > >> > > > reduce
> > >> > > > > > disk seeks,  is this similar concept to compaction in 
> > >> > > > > > HDFS
> or
> > >> Hive?
> > >> > > > > >
> > >> > > > > > Thanks
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > Dr Mich Talebzadeh
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > >> > > > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > >> > > > > > <https://www.linkedin.com/profile/view?id=
> > >> > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> > >> > > > > > OABUrV8Pw>*
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > http://talebzadehmich.wordpress.com
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > *Disclaimer:* Use it at your own risk. Any and all
> > >> responsibility
> > >> > for
> > >> > > > any
> > >> > > > > > loss, damage or destruction of data or any other 
> > >> > > > > > property
> > which
> > >> may
> > >> > > > arise
> > >> > > > > > from relying on this email's technical content is 
> > >> > > > > > explicitly
> > >> > > > disclaimed.
> > >> > > > > > The author will in no case be liable for any monetary
> damages
> > >> > arising
> > >> > > > > from
> > >> > > > > > such loss, damage or destruction.
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On 21 October 2016 at 15:27, Mich Talebzadeh <
> > >> > > > [email protected]>
> > >> > > > > > wrote:
> > >> > > > > >
> > >> > > > > > > Sorry that should read Hive not Spark here
> > >> > > > > > >
> > >> > > > > > > Say compared to Spark that is basically a SQL layer
> relying
> > on
> > >> > > > > different
> > >> > > > > > > engines (mr, Tez, Spark) to execute the code
> > >> > > > > > >
> > >> > > > > > > Dr Mich Talebzadeh
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > >> > > > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > >> > > > > > > <https://www.linkedin.com/profile/view?id=
> > >> > > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> > >> > > > > > OABUrV8Pw>*
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > http://talebzadehmich.wordpress.com
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > *Disclaimer:* Use it at your own risk. Any and all
> > >> responsibility
> > >> > > for
> > >> > > > > any
> > >> > > > > > > loss, damage or destruction of data or any other 
> > >> > > > > > > property
> > >> which
> > >> > may
> > >> > > > > arise
> > >> > > > > > > from relying on this email's technical content is
> explicitly
> > >> > > > > disclaimed.
> > >> > > > > > > The author will in no case be liable for any monetary
> > damages
> > >> > > arising
> > >> > > > > > from
> > >> > > > > > > such loss, damage or destruction.
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > On 21 October 2016 at 13:17, Ted Yu 
> > >> > > > > > > <[email protected]>
> > >> wrote:
> > >> > > > > > >
> > >> > > > > > >> Mich:
> > >> > > > > > >> Here is brief description of hbase architecture:
> > >> > > > > > >> https://hbase.apache.org/book.html#arch.overview
> > >> > > > > > >>
> > >> > > > > > >> You can also get more details from Lars George's or 
> > >> > > > > > >> Nick
> > >> > Dimiduk's
> > >> > > > > > books.
> > >> > > > > > >>
> > >> > > > > > >> HBase doesn't support SQL directly. There is no cost
> based
> > >> > > > > optimization.
> > >> > > > > > >>
> > >> > > > > > >> Cheers
> > >> > > > > > >>
> > >> > > > > > >> > On Oct 21, 2016, at 1:43 AM, Mich Talebzadeh <
> > >> > > > > > [email protected]>
> > >> > > > > > >> wrote:
> > >> > > > > > >> >
> > >> > > > > > >> > Hi,
> > >> > > > > > >> >
> > >> > > > > > >> > This is a general question.
> > >> > > > > > >> >
> > >> > > > > > >> > Is Hbase fast because Hbase uses Hash tables and
> provides
> > >> > random
> > >> > > > > > access,
> > >> > > > > > >> > and it stores the data in indexed HDFS files for 
> > >> > > > > > >> > faster
> > >> > lookups.
> > >> > > > > > >> >
> > >> > > > > > >> > Say compared to Spark that is basically a SQL 
> > >> > > > > > >> > layer
> > >> relying on
> > >> > > > > > different
> > >> > > > > > >> > engines (mr, Tez, Spark) to execute the code 
> > >> > > > > > >> > (although
> it
> > >> has
> > >> > > Cost
> > >> > > > > > Base
> > >> > > > > > >> > Optimizer), how Hbase fares, beyond relying on 
> > >> > > > > > >> > these
> > >> engines
> > >> > > > > > >> >
> > >> > > > > > >> > Thanks
> > >> > > > > > >> >
> > >> > > > > > >> >
> > >> > > > > > >> > Dr Mich Talebzadeh
> > >> > > > > > >> >
> > >> > > > > > >> >
> > >> > > > > > >> >
> > >> > > > > > >> > LinkedIn * 
> > >> > > > > > >> > https://www.linkedin.com/profile/view?id=
> > >> > > > > > AAEAAAAWh2gBxianrbJ
> > >> > > > > > >> d6zP6AcPCCdOABUrV8Pw
> > >> > > > > > >> > <https://www.linkedin.com/prof
> > >> ile/view?id=AAEAAAAWh2gBxianrb
> > >> > > > > > >> Jd6zP6AcPCCdOABUrV8Pw>*
> > >> > > > > > >> >
> > >> > > > > > >> >
> > >> > > > > > >> >
> > >> > > > > > >> > http://talebzadehmich.wordpress.com
> > >> > > > > > >> >
> > >> > > > > > >> >
> > >> > > > > > >> > *Disclaimer:* Use it at your own risk. Any and all
> > >> > > responsibility
> > >> > > > > for
> > >> > > > > > >> any
> > >> > > > > > >> > loss, damage or destruction of data or any other
> property
> > >> > which
> > >> > > > may
> > >> > > > > > >> arise
> > >> > > > > > >> > from relying on this email's technical content is
> > >> explicitly
> > >> > > > > > disclaimed.
> > >> > > > > > >> > The author will in no case be liable for any 
> > >> > > > > > >> > monetary
> > >> damages
> > >> > > > > arising
> > >> > > > > > >> from
> > >> > > > > > >> > such loss, damage or destruction.
> > >> > > > > > >>
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Reply via email to