It is true in the sense that hfile, once written (and closed), becomes
immutable.

Compaction would remove obsolete content and generate new hfiles.

Cheers

On Fri, Oct 21, 2016 at 1:59 PM, Mich Talebzadeh <[email protected]>
wrote:

> BTW. I always understood that Hbase is append only. is that generally true?
>
> thx
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 21 October 2016 at 21:57, Mich Talebzadeh <[email protected]>
> wrote:
>
> > agreed much like any rdbms
> >
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 21 October 2016 at 21:54, Ted Yu <[email protected]> wrote:
> >
> >> Well, updates (in memory) would ultimately be flushed to disk, resulting
> >> in
> >> new hfiles.
> >>
> >> On Fri, Oct 21, 2016 at 1:50 PM, Mich Talebzadeh <
> >> [email protected]>
> >> wrote:
> >>
> >> > thanks
> >> >
> >> > bq. all updates are done in memory o disk access
> >> >
> >> > I meant data updates are operated in memory, no disk access.
> >> >
> >> > in other much like rdbms read data into memory and update it there
> >> > (assuming that data is not already in memory?)
> >> >
> >> > HTH
> >> >
> >> > Dr Mich Talebzadeh
> >> >
> >> >
> >> >
> >> > LinkedIn * https://www.linkedin.com/profile/view?id=
> >> > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >> > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrb
> >> Jd6zP6AcPCCd
> >> > OABUrV8Pw>*
> >> >
> >> >
> >> >
> >> > http://talebzadehmich.wordpress.com
> >> >
> >> >
> >> > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> >> any
> >> > loss, damage or destruction of data or any other property which may
> >> arise
> >> > from relying on this email's technical content is explicitly
> disclaimed.
> >> > The author will in no case be liable for any monetary damages arising
> >> from
> >> > such loss, damage or destruction.
> >> >
> >> >
> >> >
> >> > On 21 October 2016 at 21:46, Ted Yu <[email protected]> wrote:
> >> >
> >> > > bq. this search is carried out through map-reduce on region servers?
> >> > >
> >> > > No map-reduce. region server uses its own thread(s).
> >> > >
> >> > > bq. all updates are done in memory o disk access
> >> > >
> >> > > Can you clarify ? There seems to be some missing letters.
> >> > >
> >> > > On Fri, Oct 21, 2016 at 1:43 PM, Mich Talebzadeh <
> >> > > [email protected]>
> >> > > wrote:
> >> > >
> >> > > > thanks
> >> > > >
> >> > > > having read the docs it appears to me that the main reason of
> hbase
> >> > being
> >> > > > faster is:
> >> > > >
> >> > > >
> >> > > >    1. it behaves like an rdbms like oracle tetc. reads are looked
> >> for
> >> > in
> >> > > >    the buffer cache for consistent reads and if not found then
> store
> >> > > files
> >> > > > on
> >> > > >    disks are searched. Does this mean that this search is carried
> >> out
> >> > > > through
> >> > > >    map-reduce on region servers?
> >> > > >    2. when the data is written it is written to log file
> >> sequentially
> >> > > >    first, then to in-memory store, sorted like b-tree of rdbms and
> >> then
> >> > > >    flushed to disk. this is exactly what checkpoint in an rdbms
> does
> >> > > >    3. one can point out that hbase is faster because log
> structured
> >> > merge
> >> > > >    tree (LSM-trees)  has less depth than a B-tree in rdbms.
> >> > > >    4. all updates are done in memory o disk access
> >> > > >    5. in summary LSM-trees reduce disk access when data is read
> from
> >> > disk
> >> > > >    because of reduced seek time again less depth to get data with
> >> > > LSM-tree
> >> > > >
> >> > > >
> >> > > > appreciate any comments
> >> > > >
> >> > > >
> >> > > > cheers
> >> > > >
> >> > > >
> >> > > > Dr Mich Talebzadeh
> >> > > >
> >> > > >
> >> > > >
> >> > > > LinkedIn * https://www.linkedin.com/profile/view?id=
> >> > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >> > > > <https://www.linkedin.com/profile/view?id=
> >> > AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> >> > > > OABUrV8Pw>*
> >> > > >
> >> > > >
> >> > > >
> >> > > > http://talebzadehmich.wordpress.com
> >> > > >
> >> > > >
> >> > > > *Disclaimer:* Use it at your own risk. Any and all responsibility
> >> for
> >> > any
> >> > > > loss, damage or destruction of data or any other property which
> may
> >> > arise
> >> > > > from relying on this email's technical content is explicitly
> >> > disclaimed.
> >> > > > The author will in no case be liable for any monetary damages
> >> arising
> >> > > from
> >> > > > such loss, damage or destruction.
> >> > > >
> >> > > >
> >> > > >
> >> > > > On 21 October 2016 at 17:51, Ted Yu <[email protected]> wrote:
> >> > > >
> >> > > > > See some prior blog:
> >> > > > >
> >> > > > > http://www.cyanny.com/2014/03/13/hbase-architecture-
> >> > > > > analysis-part1-logical-architecture/
> >> > > > >
> >> > > > > w.r.t. compaction in Hive, it is used to compact deltas into a
> >> base
> >> > > file
> >> > > > > (in the context of transactions).  Likely they're different.
> >> > > > >
> >> > > > > Cheers
> >> > > > >
> >> > > > > On Fri, Oct 21, 2016 at 9:08 AM, Mich Talebzadeh <
> >> > > > > [email protected]>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Hi,
> >> > > > > >
> >> > > > > > Can someone in a nutshell explain *the *Hbase use of
> >> log-structured
> >> > > > > > merge-tree (LSM-tree) as data storage architecture
> >> > > > > >
> >> > > > > > The idea of merging smaller files to larger files periodically
> >> to
> >> > > > reduce
> >> > > > > > disk seeks,  is this similar concept to compaction in HDFS or
> >> Hive?
> >> > > > > >
> >> > > > > > Thanks
> >> > > > > >
> >> > > > > >
> >> > > > > > Dr Mich Talebzadeh
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > LinkedIn * https://www.linkedin.com/profile/view?id=
> >> > > > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >> > > > > > <https://www.linkedin.com/profile/view?id=
> >> > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> >> > > > > > OABUrV8Pw>*
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > http://talebzadehmich.wordpress.com
> >> > > > > >
> >> > > > > >
> >> > > > > > *Disclaimer:* Use it at your own risk. Any and all
> >> responsibility
> >> > for
> >> > > > any
> >> > > > > > loss, damage or destruction of data or any other property
> which
> >> may
> >> > > > arise
> >> > > > > > from relying on this email's technical content is explicitly
> >> > > > disclaimed.
> >> > > > > > The author will in no case be liable for any monetary damages
> >> > arising
> >> > > > > from
> >> > > > > > such loss, damage or destruction.
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > On 21 October 2016 at 15:27, Mich Talebzadeh <
> >> > > > [email protected]>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > Sorry that should read Hive not Spark here
> >> > > > > > >
> >> > > > > > > Say compared to Spark that is basically a SQL layer relying
> on
> >> > > > > different
> >> > > > > > > engines (mr, Tez, Spark) to execute the code
> >> > > > > > >
> >> > > > > > > Dr Mich Talebzadeh
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > LinkedIn * https://www.linkedin.com/profile/view?id=
> >> > > > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >> > > > > > > <https://www.linkedin.com/profile/view?id=
> >> > > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> >> > > > > > OABUrV8Pw>*
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > http://talebzadehmich.wordpress.com
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > *Disclaimer:* Use it at your own risk. Any and all
> >> responsibility
> >> > > for
> >> > > > > any
> >> > > > > > > loss, damage or destruction of data or any other property
> >> which
> >> > may
> >> > > > > arise
> >> > > > > > > from relying on this email's technical content is explicitly
> >> > > > > disclaimed.
> >> > > > > > > The author will in no case be liable for any monetary
> damages
> >> > > arising
> >> > > > > > from
> >> > > > > > > such loss, damage or destruction.
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > On 21 October 2016 at 13:17, Ted Yu <[email protected]>
> >> wrote:
> >> > > > > > >
> >> > > > > > >> Mich:
> >> > > > > > >> Here is brief description of hbase architecture:
> >> > > > > > >> https://hbase.apache.org/book.html#arch.overview
> >> > > > > > >>
> >> > > > > > >> You can also get more details from Lars George's or Nick
> >> > Dimiduk's
> >> > > > > > books.
> >> > > > > > >>
> >> > > > > > >> HBase doesn't support SQL directly. There is no cost based
> >> > > > > optimization.
> >> > > > > > >>
> >> > > > > > >> Cheers
> >> > > > > > >>
> >> > > > > > >> > On Oct 21, 2016, at 1:43 AM, Mich Talebzadeh <
> >> > > > > > [email protected]>
> >> > > > > > >> wrote:
> >> > > > > > >> >
> >> > > > > > >> > Hi,
> >> > > > > > >> >
> >> > > > > > >> > This is a general question.
> >> > > > > > >> >
> >> > > > > > >> > Is Hbase fast because Hbase uses Hash tables and provides
> >> > random
> >> > > > > > access,
> >> > > > > > >> > and it stores the data in indexed HDFS files for faster
> >> > lookups.
> >> > > > > > >> >
> >> > > > > > >> > Say compared to Spark that is basically a SQL layer
> >> relying on
> >> > > > > > different
> >> > > > > > >> > engines (mr, Tez, Spark) to execute the code (although it
> >> has
> >> > > Cost
> >> > > > > > Base
> >> > > > > > >> > Optimizer), how Hbase fares, beyond relying on these
> >> engines
> >> > > > > > >> >
> >> > > > > > >> > Thanks
> >> > > > > > >> >
> >> > > > > > >> >
> >> > > > > > >> > Dr Mich Talebzadeh
> >> > > > > > >> >
> >> > > > > > >> >
> >> > > > > > >> >
> >> > > > > > >> > LinkedIn * https://www.linkedin.com/profile/view?id=
> >> > > > > > AAEAAAAWh2gBxianrbJ
> >> > > > > > >> d6zP6AcPCCdOABUrV8Pw
> >> > > > > > >> > <https://www.linkedin.com/prof
> >> ile/view?id=AAEAAAAWh2gBxianrb
> >> > > > > > >> Jd6zP6AcPCCdOABUrV8Pw>*
> >> > > > > > >> >
> >> > > > > > >> >
> >> > > > > > >> >
> >> > > > > > >> > http://talebzadehmich.wordpress.com
> >> > > > > > >> >
> >> > > > > > >> >
> >> > > > > > >> > *Disclaimer:* Use it at your own risk. Any and all
> >> > > responsibility
> >> > > > > for
> >> > > > > > >> any
> >> > > > > > >> > loss, damage or destruction of data or any other property
> >> > which
> >> > > > may
> >> > > > > > >> arise
> >> > > > > > >> > from relying on this email's technical content is
> >> explicitly
> >> > > > > > disclaimed.
> >> > > > > > >> > The author will in no case be liable for any monetary
> >> damages
> >> > > > > arising
> >> > > > > > >> from
> >> > > > > > >> > such loss, damage or destruction.
> >> > > > > > >>
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Reply via email to