Re: Storing XML file in Hbase

2017-01-16 Thread Dima Spivak
Manjeet,

Your question has nothing to do with storing XML files in HBase and you
already have another thread asking questions about storing video. Please
don't resurrect old topics to ask the same thing twice.

On Sun, Jan 15, 2017 at 11:28 PM Manjeet Singh 
wrote:

> I have same question but its related to storing vedio files more than 10
> MB.
>
>
>
> Questions are:
>
> (1) how can I increase 10 MOB size?
>
> (2) what is the performance impact does any buddy have stats?
>
> (3) what is the best recommended size of MOB?
>
> (4)  HColumnDescriptor hc = new HColumnDescriptor(“f”);
>
>  hc.setMobEnabled(true);  // what is the setter method in hbase client
>
> 1.2.1
>
>  hc.setMobThreshold(102400L);//what is the setter method in hbase
>
> client 1.2.1
>
>
>
>
>
> Thanks
>
> Manjeet
>
>
>
> On Tue, Nov 29, 2016 at 4:52 AM, Richard Startin <
> richardstar...@outlook.com
>
> > wrote:
>
>
>
> > In my experience it's better to keep the number of column families low.
>
> > When flushes occur, they effect all column families in a table, so when
> the
>
> > memstore fills you'll create an HFile per family. I haven't seen any
>
> > performance impact in having two column families though.
>
> >
>
> >
>
> > As for the number of columns, there are two extremes - 1) "narrow" -
> store
>
> > the xml as a blob in a single cell; 2) "wide" break it out into columns,
> of
>
> > which you can have thousands.
>
> >
>
> >
>
> >   1.  In the case where you store XML as a blob you always need to
>
> > retrieve the entire document, and must deserialise it to perform
>
> > operations. You save space in not repeating the row key, you save space
> on
>
> > column and column family qualifiers
>
> >   2.  When you break the XML out into columns you can retrieve data at a
>
> > per attribute level, which might save IO by filtering unnecessary
> content,
>
> > and you don't need to break open the XML to perform operations. You
> incur a
>
> > cost in repeating the row key per tuple (this can add up and will effect
>
> > read performance by limiting the number of rows that can fit into the
> block
>
> > cache), as well as the extra cost of column families. There is a
> practical
>
> > limit to the number of columns because a row cannot be split across
> regions.
>
> >
>
> > You may find optimal performance for you use case somewhere between the
>
> > two extremes and it's best to prototype and measure early.
>
> >
>
> > Cheers,
>
> > Richard
>
> >
>
> >
>
> > https://richardstartin.com/
>
> >
>
> >
>
> > 
>
> > From: Mich Talebzadeh 
>
> > Sent: 28 November 2016 21:57
>
> > To: user@hbase.apache.org
>
> > Subject: Re: Storing XML file in Hbase
>
> >
>
> > Thanks Richard.
>
> >
>
> > How would one decide on the number of column family and columns?
>
> >
>
> > Is there a ballpark approach
>
> >
>
> > Cheers
>
> >
>
> > Dr Mich Talebzadeh
>
> >
>
> >
>
> >
>
> > LinkedIn * https://www.linkedin.com/profile/view?id=
>
> > AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
> > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
>
> > OABUrV8Pw>*
>
> >
>
> >
>
> >
>
> > http://talebzadehmich.wordpress.com
>
> >
>
> >
>
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
>
> > loss, damage or destruction of data or any other property which may arise
>
> > from relying on this email's technical content is explicitly disclaimed.
>
> > The author will in no case be liable for any monetary damages arising
> from
>
> > such loss, damage or destruction.
>
> >
>
> >
>
> >
>
> > On 28 November 2016 at 16:04, Richard Startin <
> richardstar...@outlook.com>
>
> > wrote:
>
> >
>
> > > Hi Mich,
>
> > >
>
> > > If you want to store the file whole, you'll need to enforce a 10MB
> limit
>
> > > to the file size, otherwise you will flush too often (each time the me
>
> > > store fills up) which will slow down writes.
>
> > >
>
> > > Maybe you could deconstruct the xml by extracting columns from the xml
>
> 

Re: Storing XML file in Hbase

2017-01-15 Thread Manjeet Singh
I have same question but its related to storing vedio files more than 10 MB.

Questions are:
(1) how can I increase 10 MOB size?
(2) what is the performance impact does any buddy have stats?
(3) what is the best recommended size of MOB?
(4)  HColumnDescriptor hc = new HColumnDescriptor(“f”);
 hc.setMobEnabled(true);  // what is the setter method in hbase client
1.2.1
 hc.setMobThreshold(102400L);//what is the setter method in hbase
client 1.2.1


Thanks
Manjeet

On Tue, Nov 29, 2016 at 4:52 AM, Richard Startin  wrote:

> In my experience it's better to keep the number of column families low.
> When flushes occur, they effect all column families in a table, so when the
> memstore fills you'll create an HFile per family. I haven't seen any
> performance impact in having two column families though.
>
>
> As for the number of columns, there are two extremes - 1) "narrow" - store
> the xml as a blob in a single cell; 2) "wide" break it out into columns, of
> which you can have thousands.
>
>
>   1.  In the case where you store XML as a blob you always need to
> retrieve the entire document, and must deserialise it to perform
> operations. You save space in not repeating the row key, you save space on
> column and column family qualifiers
>   2.  When you break the XML out into columns you can retrieve data at a
> per attribute level, which might save IO by filtering unnecessary content,
> and you don't need to break open the XML to perform operations. You incur a
> cost in repeating the row key per tuple (this can add up and will effect
> read performance by limiting the number of rows that can fit into the block
> cache), as well as the extra cost of column families. There is a practical
> limit to the number of columns because a row cannot be split across regions.
>
> You may find optimal performance for you use case somewhere between the
> two extremes and it's best to prototype and measure early.
>
> Cheers,
> Richard
>
>
> https://richardstartin.com/
>
>
> ________
> From: Mich Talebzadeh 
> Sent: 28 November 2016 21:57
> To: user@hbase.apache.org
> Subject: Re: Storing XML file in Hbase
>
> Thanks Richard.
>
> How would one decide on the number of column family and columns?
>
> Is there a ballpark approach
>
> Cheers
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 28 November 2016 at 16:04, Richard Startin 
> wrote:
>
> > Hi Mich,
> >
> > If you want to store the file whole, you'll need to enforce a 10MB limit
> > to the file size, otherwise you will flush too often (each time the me
> > store fills up) which will slow down writes.
> >
> > Maybe you could deconstruct the xml by extracting columns from the xml
> > using xpath?
> >
> > If the files are small there might be a tangible performance benefit by
> > limiting the number of columns.
> >
> > Cheers,
> > Richard
> >
> > Sent from my iPhone
> >
> > > On 28 Nov 2016, at 15:53, Dima Spivak  wrote:
> > >
> > > Hi Mich,
> > >
> > > How many files are you looking to store? How often do you need to read
> > > them? What's the total size of all the files you need to serve?
> > >
> > > Cheers,
> > > Dima
> > >
> > > On Mon, Nov 28, 2016 at 7:04 AM Mich Talebzadeh <
> > mich.talebza...@gmail.com>
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> Storing XML file in Big Data. Are there any strategies to create
> > multiple
> > >> column families or just one column family and in that case how many
> > columns
> > >> would be optional?
> > >>
> > >> thanks
> > >>
> > >> Dr Mich Talebzadeh
> > >>
> > >>
> > >>
> > >> LinkedIn *
> > >> https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw
> > >> <
> > >> https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw
> > >>> *
> > >>
> > >>
> > >>
> > >> http://talebzadehmich.wordpress.com
> > >>
> > >>
> > >> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> > any
> > >> loss, damage or destruction of data or any other property which may
> > arise
> > >> from relying on this email's technical content is explicitly
> disclaimed.
> > >> The author will in no case be liable for any monetary damages arising
> > from
> > >> such loss, damage or destruction.
> > >>
> >
>



-- 
luv all


Re: Storing XML file in Hbase

2016-11-28 Thread Richard Startin
In my experience it's better to keep the number of column families low. When 
flushes occur, they effect all column families in a table, so when the memstore 
fills you'll create an HFile per family. I haven't seen any performance impact 
in having two column families though.


As for the number of columns, there are two extremes - 1) "narrow" - store the 
xml as a blob in a single cell; 2) "wide" break it out into columns, of which 
you can have thousands.


  1.  In the case where you store XML as a blob you always need to retrieve the 
entire document, and must deserialise it to perform operations. You save space 
in not repeating the row key, you save space on column and column family 
qualifiers
  2.  When you break the XML out into columns you can retrieve data at a per 
attribute level, which might save IO by filtering unnecessary content, and you 
don't need to break open the XML to perform operations. You incur a cost in 
repeating the row key per tuple (this can add up and will effect read 
performance by limiting the number of rows that can fit into the block cache), 
as well as the extra cost of column families. There is a practical limit to the 
number of columns because a row cannot be split across regions.

You may find optimal performance for you use case somewhere between the two 
extremes and it's best to prototype and measure early.

Cheers,
Richard


https://richardstartin.com/



From: Mich Talebzadeh 
Sent: 28 November 2016 21:57
To: user@hbase.apache.org
Subject: Re: Storing XML file in Hbase

Thanks Richard.

How would one decide on the number of column family and columns?

Is there a ballpark approach

Cheers

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 28 November 2016 at 16:04, Richard Startin 
wrote:

> Hi Mich,
>
> If you want to store the file whole, you'll need to enforce a 10MB limit
> to the file size, otherwise you will flush too often (each time the me
> store fills up) which will slow down writes.
>
> Maybe you could deconstruct the xml by extracting columns from the xml
> using xpath?
>
> If the files are small there might be a tangible performance benefit by
> limiting the number of columns.
>
> Cheers,
> Richard
>
> Sent from my iPhone
>
> > On 28 Nov 2016, at 15:53, Dima Spivak  wrote:
> >
> > Hi Mich,
> >
> > How many files are you looking to store? How often do you need to read
> > them? What's the total size of all the files you need to serve?
> >
> > Cheers,
> > Dima
> >
> > On Mon, Nov 28, 2016 at 7:04 AM Mich Talebzadeh <
> mich.talebza...@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> Storing XML file in Big Data. Are there any strategies to create
> multiple
> >> column families or just one column family and in that case how many
> columns
> >> would be optional?
> >>
> >> thanks
> >>
> >> Dr Mich Talebzadeh
> >>
> >>
> >>
> >> LinkedIn *
> >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw
> >> <
> >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw
> >>> *
> >>
> >>
> >>
> >> http://talebzadehmich.wordpress.com
> >>
> >>
> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any
> >> loss, damage or destruction of data or any other property which may
> arise
> >> from relying on this email's technical content is explicitly disclaimed.
> >> The author will in no case be liable for any monetary damages arising
> from
> >> such loss, damage or destruction.
> >>
>


Re: Storing XML file in Hbase

2016-11-28 Thread Mich Talebzadeh
Thanks Richard.

How would one decide on the number of column family and columns?

Is there a ballpark approach

Cheers

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 28 November 2016 at 16:04, Richard Startin 
wrote:

> Hi Mich,
>
> If you want to store the file whole, you'll need to enforce a 10MB limit
> to the file size, otherwise you will flush too often (each time the me
> store fills up) which will slow down writes.
>
> Maybe you could deconstruct the xml by extracting columns from the xml
> using xpath?
>
> If the files are small there might be a tangible performance benefit by
> limiting the number of columns.
>
> Cheers,
> Richard
>
> Sent from my iPhone
>
> > On 28 Nov 2016, at 15:53, Dima Spivak  wrote:
> >
> > Hi Mich,
> >
> > How many files are you looking to store? How often do you need to read
> > them? What's the total size of all the files you need to serve?
> >
> > Cheers,
> > Dima
> >
> > On Mon, Nov 28, 2016 at 7:04 AM Mich Talebzadeh <
> mich.talebza...@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> Storing XML file in Big Data. Are there any strategies to create
> multiple
> >> column families or just one column family and in that case how many
> columns
> >> would be optional?
> >>
> >> thanks
> >>
> >> Dr Mich Talebzadeh
> >>
> >>
> >>
> >> LinkedIn *
> >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw
> >> <
> >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw
> >>> *
> >>
> >>
> >>
> >> http://talebzadehmich.wordpress.com
> >>
> >>
> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any
> >> loss, damage or destruction of data or any other property which may
> arise
> >> from relying on this email's technical content is explicitly disclaimed.
> >> The author will in no case be liable for any monetary damages arising
> from
> >> such loss, damage or destruction.
> >>
>


Re: Storing XML file in Hbase

2016-11-28 Thread Richard Startin
Hi Mich,

If you want to store the file whole, you'll need to enforce a 10MB limit to the 
file size, otherwise you will flush too often (each time the me store fills up) 
which will slow down writes. 

Maybe you could deconstruct the xml by extracting columns from the xml using 
xpath?

If the files are small there might be a tangible performance benefit by 
limiting the number of columns.

Cheers,
Richard

Sent from my iPhone

> On 28 Nov 2016, at 15:53, Dima Spivak  wrote:
> 
> Hi Mich,
> 
> How many files are you looking to store? How often do you need to read
> them? What's the total size of all the files you need to serve?
> 
> Cheers,
> Dima
> 
> On Mon, Nov 28, 2016 at 7:04 AM Mich Talebzadeh 
> wrote:
> 
>> Hi,
>> 
>> Storing XML file in Big Data. Are there any strategies to create multiple
>> column families or just one column family and in that case how many columns
>> would be optional?
>> 
>> thanks
>> 
>> Dr Mich Talebzadeh
>> 
>> 
>> 
>> LinkedIn *
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>> 
>> 
>> 
>> http://talebzadehmich.wordpress.com
>> 
>> 
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
>> loss, damage or destruction of data or any other property which may arise
>> from relying on this email's technical content is explicitly disclaimed.
>> The author will in no case be liable for any monetary damages arising from
>> such loss, damage or destruction.
>> 


Re: Storing XML file in Hbase

2016-11-28 Thread Dima Spivak
Hi Mich,

How many files are you looking to store? How often do you need to read
them? What's the total size of all the files you need to serve?

Cheers,
Dima

On Mon, Nov 28, 2016 at 7:04 AM Mich Talebzadeh 
wrote:

> Hi,
>
> Storing XML file in Big Data. Are there any strategies to create multiple
> column families or just one column family and in that case how many columns
> would be optional?
>
> thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>


Storing XML file in Hbase

2016-11-28 Thread Mich Talebzadeh
Hi,

Storing XML file in Big Data. Are there any strategies to create multiple
column families or just one column family and in that case how many columns
would be optional?

thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.