It is true in the sense that hfile, once written (and closed), becomes immutable.
Compaction would remove obsolete content and generate new hfiles. Cheers On Fri, Oct 21, 2016 at 1:59 PM, Mich Talebzadeh <[email protected]> wrote: > BTW. I always understood that Hbase is append only. is that generally true? > > thx > > Dr Mich Talebzadeh > > > > LinkedIn * https://www.linkedin.com/profile/view?id= > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd > OABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 21 October 2016 at 21:57, Mich Talebzadeh <[email protected]> > wrote: > > > agreed much like any rdbms > > > > > > > > Dr Mich Talebzadeh > > > > > > > > LinkedIn * https://www.linkedin.com/profile/view?id= > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd > OABUrV8Pw>* > > > > > > > > http://talebzadehmich.wordpress.com > > > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > > loss, damage or destruction of data or any other property which may arise > > from relying on this email's technical content is explicitly disclaimed. > > The author will in no case be liable for any monetary damages arising > from > > such loss, damage or destruction. > > > > > > > > On 21 October 2016 at 21:54, Ted Yu <[email protected]> wrote: > > > >> Well, updates (in memory) would ultimately be flushed to disk, resulting > >> in > >> new hfiles. > >> > >> On Fri, Oct 21, 2016 at 1:50 PM, Mich Talebzadeh < > >> [email protected]> > >> wrote: > >> > >> > thanks > >> > > >> > bq. all updates are done in memory o disk access > >> > > >> > I meant data updates are operated in memory, no disk access. > >> > > >> > in other much like rdbms read data into memory and update it there > >> > (assuming that data is not already in memory?) > >> > > >> > HTH > >> > > >> > Dr Mich Talebzadeh > >> > > >> > > >> > > >> > LinkedIn * https://www.linkedin.com/profile/view?id= > >> > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > >> > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrb > >> Jd6zP6AcPCCd > >> > OABUrV8Pw>* > >> > > >> > > >> > > >> > http://talebzadehmich.wordpress.com > >> > > >> > > >> > *Disclaimer:* Use it at your own risk. Any and all responsibility for > >> any > >> > loss, damage or destruction of data or any other property which may > >> arise > >> > from relying on this email's technical content is explicitly > disclaimed. > >> > The author will in no case be liable for any monetary damages arising > >> from > >> > such loss, damage or destruction. > >> > > >> > > >> > > >> > On 21 October 2016 at 21:46, Ted Yu <[email protected]> wrote: > >> > > >> > > bq. this search is carried out through map-reduce on region servers? > >> > > > >> > > No map-reduce. region server uses its own thread(s). > >> > > > >> > > bq. all updates are done in memory o disk access > >> > > > >> > > Can you clarify ? There seems to be some missing letters. > >> > > > >> > > On Fri, Oct 21, 2016 at 1:43 PM, Mich Talebzadeh < > >> > > [email protected]> > >> > > wrote: > >> > > > >> > > > thanks > >> > > > > >> > > > having read the docs it appears to me that the main reason of > hbase > >> > being > >> > > > faster is: > >> > > > > >> > > > > >> > > > 1. it behaves like an rdbms like oracle tetc. reads are looked > >> for > >> > in > >> > > > the buffer cache for consistent reads and if not found then > store > >> > > files > >> > > > on > >> > > > disks are searched. Does this mean that this search is carried > >> out > >> > > > through > >> > > > map-reduce on region servers? > >> > > > 2. when the data is written it is written to log file > >> sequentially > >> > > > first, then to in-memory store, sorted like b-tree of rdbms and > >> then > >> > > > flushed to disk. this is exactly what checkpoint in an rdbms > does > >> > > > 3. one can point out that hbase is faster because log > structured > >> > merge > >> > > > tree (LSM-trees) has less depth than a B-tree in rdbms. > >> > > > 4. all updates are done in memory o disk access > >> > > > 5. in summary LSM-trees reduce disk access when data is read > from > >> > disk > >> > > > because of reduced seek time again less depth to get data with > >> > > LSM-tree > >> > > > > >> > > > > >> > > > appreciate any comments > >> > > > > >> > > > > >> > > > cheers > >> > > > > >> > > > > >> > > > Dr Mich Talebzadeh > >> > > > > >> > > > > >> > > > > >> > > > LinkedIn * https://www.linkedin.com/profile/view?id= > >> > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > >> > > > <https://www.linkedin.com/profile/view?id= > >> > AAEAAAAWh2gBxianrbJd6zP6AcPCCd > >> > > > OABUrV8Pw>* > >> > > > > >> > > > > >> > > > > >> > > > http://talebzadehmich.wordpress.com > >> > > > > >> > > > > >> > > > *Disclaimer:* Use it at your own risk. Any and all responsibility > >> for > >> > any > >> > > > loss, damage or destruction of data or any other property which > may > >> > arise > >> > > > from relying on this email's technical content is explicitly > >> > disclaimed. > >> > > > The author will in no case be liable for any monetary damages > >> arising > >> > > from > >> > > > such loss, damage or destruction. > >> > > > > >> > > > > >> > > > > >> > > > On 21 October 2016 at 17:51, Ted Yu <[email protected]> wrote: > >> > > > > >> > > > > See some prior blog: > >> > > > > > >> > > > > http://www.cyanny.com/2014/03/13/hbase-architecture- > >> > > > > analysis-part1-logical-architecture/ > >> > > > > > >> > > > > w.r.t. compaction in Hive, it is used to compact deltas into a > >> base > >> > > file > >> > > > > (in the context of transactions). Likely they're different. > >> > > > > > >> > > > > Cheers > >> > > > > > >> > > > > On Fri, Oct 21, 2016 at 9:08 AM, Mich Talebzadeh < > >> > > > > [email protected]> > >> > > > > wrote: > >> > > > > > >> > > > > > Hi, > >> > > > > > > >> > > > > > Can someone in a nutshell explain *the *Hbase use of > >> log-structured > >> > > > > > merge-tree (LSM-tree) as data storage architecture > >> > > > > > > >> > > > > > The idea of merging smaller files to larger files periodically > >> to > >> > > > reduce > >> > > > > > disk seeks, is this similar concept to compaction in HDFS or > >> Hive? > >> > > > > > > >> > > > > > Thanks > >> > > > > > > >> > > > > > > >> > > > > > Dr Mich Talebzadeh > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > LinkedIn * https://www.linkedin.com/profile/view?id= > >> > > > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > >> > > > > > <https://www.linkedin.com/profile/view?id= > >> > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCd > >> > > > > > OABUrV8Pw>* > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > http://talebzadehmich.wordpress.com > >> > > > > > > >> > > > > > > >> > > > > > *Disclaimer:* Use it at your own risk. Any and all > >> responsibility > >> > for > >> > > > any > >> > > > > > loss, damage or destruction of data or any other property > which > >> may > >> > > > arise > >> > > > > > from relying on this email's technical content is explicitly > >> > > > disclaimed. > >> > > > > > The author will in no case be liable for any monetary damages > >> > arising > >> > > > > from > >> > > > > > such loss, damage or destruction. > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > On 21 October 2016 at 15:27, Mich Talebzadeh < > >> > > > [email protected]> > >> > > > > > wrote: > >> > > > > > > >> > > > > > > Sorry that should read Hive not Spark here > >> > > > > > > > >> > > > > > > Say compared to Spark that is basically a SQL layer relying > on > >> > > > > different > >> > > > > > > engines (mr, Tez, Spark) to execute the code > >> > > > > > > > >> > > > > > > Dr Mich Talebzadeh > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > LinkedIn * https://www.linkedin.com/profile/view?id= > >> > > > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > >> > > > > > > <https://www.linkedin.com/profile/view?id= > >> > > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCd > >> > > > > > OABUrV8Pw>* > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > http://talebzadehmich.wordpress.com > >> > > > > > > > >> > > > > > > > >> > > > > > > *Disclaimer:* Use it at your own risk. Any and all > >> responsibility > >> > > for > >> > > > > any > >> > > > > > > loss, damage or destruction of data or any other property > >> which > >> > may > >> > > > > arise > >> > > > > > > from relying on this email's technical content is explicitly > >> > > > > disclaimed. > >> > > > > > > The author will in no case be liable for any monetary > damages > >> > > arising > >> > > > > > from > >> > > > > > > such loss, damage or destruction. > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > On 21 October 2016 at 13:17, Ted Yu <[email protected]> > >> wrote: > >> > > > > > > > >> > > > > > >> Mich: > >> > > > > > >> Here is brief description of hbase architecture: > >> > > > > > >> https://hbase.apache.org/book.html#arch.overview > >> > > > > > >> > >> > > > > > >> You can also get more details from Lars George's or Nick > >> > Dimiduk's > >> > > > > > books. > >> > > > > > >> > >> > > > > > >> HBase doesn't support SQL directly. There is no cost based > >> > > > > optimization. > >> > > > > > >> > >> > > > > > >> Cheers > >> > > > > > >> > >> > > > > > >> > On Oct 21, 2016, at 1:43 AM, Mich Talebzadeh < > >> > > > > > [email protected]> > >> > > > > > >> wrote: > >> > > > > > >> > > >> > > > > > >> > Hi, > >> > > > > > >> > > >> > > > > > >> > This is a general question. > >> > > > > > >> > > >> > > > > > >> > Is Hbase fast because Hbase uses Hash tables and provides > >> > random > >> > > > > > access, > >> > > > > > >> > and it stores the data in indexed HDFS files for faster > >> > lookups. > >> > > > > > >> > > >> > > > > > >> > Say compared to Spark that is basically a SQL layer > >> relying on > >> > > > > > different > >> > > > > > >> > engines (mr, Tez, Spark) to execute the code (although it > >> has > >> > > Cost > >> > > > > > Base > >> > > > > > >> > Optimizer), how Hbase fares, beyond relying on these > >> engines > >> > > > > > >> > > >> > > > > > >> > Thanks > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > Dr Mich Talebzadeh > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > LinkedIn * https://www.linkedin.com/profile/view?id= > >> > > > > > AAEAAAAWh2gBxianrbJ > >> > > > > > >> d6zP6AcPCCdOABUrV8Pw > >> > > > > > >> > <https://www.linkedin.com/prof > >> ile/view?id=AAEAAAAWh2gBxianrb > >> > > > > > >> Jd6zP6AcPCCdOABUrV8Pw>* > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > http://talebzadehmich.wordpress.com > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > *Disclaimer:* Use it at your own risk. Any and all > >> > > responsibility > >> > > > > for > >> > > > > > >> any > >> > > > > > >> > loss, damage or destruction of data or any other property > >> > which > >> > > > may > >> > > > > > >> arise > >> > > > > > >> > from relying on this email's technical content is > >> explicitly > >> > > > > > disclaimed. > >> > > > > > >> > The author will in no case be liable for any monetary > >> damages > >> > > > > arising > >> > > > > > >> from > >> > > > > > >> > such loss, damage or destruction. > >> > > > > > >> > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > > > >
