Manjeet,

Your question has nothing to do with storing XML files in HBase and you
already have another thread asking questions about storing video. Please
don't resurrect old topics to ask the same thing twice.

On Sun, Jan 15, 2017 at 11:28 PM Manjeet Singh <manjeet.chand...@gmail.com>
wrote:

> I have same question but its related to storing vedio files more than 10
> MB.
>
>
>
> Questions are:
>
> (1) how can I increase 10 MOB size?
>
> (2) what is the performance impact does any buddy have stats?
>
> (3) what is the best recommended size of MOB?
>
> (4)      HColumnDescriptor hc = new HColumnDescriptor(“f”);
>
>      hc.setMobEnabled(true);  // what is the setter method in hbase client
>
> 1.2.1
>
>      hc.setMobThreshold(102400L);//what is the setter method in hbase
>
> client 1.2.1
>
>
>
>
>
> Thanks
>
> Manjeet
>
>
>
> On Tue, Nov 29, 2016 at 4:52 AM, Richard Startin <
> richardstar...@outlook.com
>
> > wrote:
>
>
>
> > In my experience it's better to keep the number of column families low.
>
> > When flushes occur, they effect all column families in a table, so when
> the
>
> > memstore fills you'll create an HFile per family. I haven't seen any
>
> > performance impact in having two column families though.
>
> >
>
> >
>
> > As for the number of columns, there are two extremes - 1) "narrow" -
> store
>
> > the xml as a blob in a single cell; 2) "wide" break it out into columns,
> of
>
> > which you can have thousands.
>
> >
>
> >
>
> >   1.  In the case where you store XML as a blob you always need to
>
> > retrieve the entire document, and must deserialise it to perform
>
> > operations. You save space in not repeating the row key, you save space
> on
>
> > column and column family qualifiers
>
> >   2.  When you break the XML out into columns you can retrieve data at a
>
> > per attribute level, which might save IO by filtering unnecessary
> content,
>
> > and you don't need to break open the XML to perform operations. You
> incur a
>
> > cost in repeating the row key per tuple (this can add up and will effect
>
> > read performance by limiting the number of rows that can fit into the
> block
>
> > cache), as well as the extra cost of column families. There is a
> practical
>
> > limit to the number of columns because a row cannot be split across
> regions.
>
> >
>
> > You may find optimal performance for you use case somewhere between the
>
> > two extremes and it's best to prototype and measure early.
>
> >
>
> > Cheers,
>
> > Richard
>
> >
>
> >
>
> > https://richardstartin.com/
>
> >
>
> >
>
> > ________________________________
>
> > From: Mich Talebzadeh <mich.talebza...@gmail.com>
>
> > Sent: 28 November 2016 21:57
>
> > To: user@hbase.apache.org
>
> > Subject: Re: Storing XML file in Hbase
>
> >
>
> > Thanks Richard.
>
> >
>
> > How would one decide on the number of column family and columns?
>
> >
>
> > Is there a ballpark approach
>
> >
>
> > Cheers
>
> >
>
> > Dr Mich Talebzadeh
>
> >
>
> >
>
> >
>
> > LinkedIn * https://www.linkedin.com/profile/view?id=
>
> > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
> > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd
>
> > OABUrV8Pw>*
>
> >
>
> >
>
> >
>
> > http://talebzadehmich.wordpress.com
>
> >
>
> >
>
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
>
> > loss, damage or destruction of data or any other property which may arise
>
> > from relying on this email's technical content is explicitly disclaimed.
>
> > The author will in no case be liable for any monetary damages arising
> from
>
> > such loss, damage or destruction.
>
> >
>
> >
>
> >
>
> > On 28 November 2016 at 16:04, Richard Startin <
> richardstar...@outlook.com>
>
> > wrote:
>
> >
>
> > > Hi Mich,
>
> > >
>
> > > If you want to store the file whole, you'll need to enforce a 10MB
> limit
>
> > > to the file size, otherwise you will flush too often (each time the me
>
> > > store fills up) which will slow down writes.
>
> > >
>
> > > Maybe you could deconstruct the xml by extracting columns from the xml
>
> > > using xpath?
>
> > >
>
> > > If the files are small there might be a tangible performance benefit by
>
> > > limiting the number of columns.
>
> > >
>
> > > Cheers,
>
> > > Richard
>
> > >
>
> > > Sent from my iPhone
>
> > >
>
> > > > On 28 Nov 2016, at 15:53, Dima Spivak <dimaspi...@apache.org> wrote:
>
> > > >
>
> > > > Hi Mich,
>
> > > >
>
> > > > How many files are you looking to store? How often do you need to
> read
>
> > > > them? What's the total size of all the files you need to serve?
>
> > > >
>
> > > > Cheers,
>
> > > > Dima
>
> > > >
>
> > > > On Mon, Nov 28, 2016 at 7:04 AM Mich Talebzadeh <
>
> > > mich.talebza...@gmail.com>
>
> > > > wrote:
>
> > > >
>
> > > >> Hi,
>
> > > >>
>
> > > >> Storing XML file in Big Data. Are there any strategies to create
>
> > > multiple
>
> > > >> column families or just one column family and in that case how many
>
> > > columns
>
> > > >> would be optional?
>
> > > >>
>
> > > >> thanks
>
> > > >>
>
> > > >> Dr Mich Talebzadeh
>
> > > >>
>
> > > >>
>
> > > >>
>
> > > >> LinkedIn *
>
> > > >> https://www.linkedin.com/profile/view?id=
>
> > AAEAAAAWh2gBxianrbJd6zP6AcPCCd
>
> > > OABUrV8Pw
>
> > > >> <
>
> > > >> https://www.linkedin.com/profile/view?id=
>
> > AAEAAAAWh2gBxianrbJd6zP6AcPCCd
>
> > > OABUrV8Pw
>
> > > >>> *
>
> > > >>
>
> > > >>
>
> > > >>
>
> > > >> http://talebzadehmich.wordpress.com
>
> > > >>
>
> > > >>
>
> > > >> *Disclaimer:* Use it at your own risk. Any and all responsibility
> for
>
> > > any
>
> > > >> loss, damage or destruction of data or any other property which may
>
> > > arise
>
> > > >> from relying on this email's technical content is explicitly
>
> > disclaimed.
>
> > > >> The author will in no case be liable for any monetary damages
> arising
>
> > > from
>
> > > >> such loss, damage or destruction.
>
> > > >>
>
> > >
>
> >
>
>
>
>
>
>
>
> --
>
> luv all
>
>

Reply via email to