Me too realize same what you suggest...: (Keep them in a separate files in HDFS and store in HBase only references)
will try several attachments into a single file... And Thanks a lot... On Wed, Feb 26, 2014 at 1:45 AM, Vladimir Rodionov <[email protected]>wrote: > Usually, it is not advisable to store such a large values in HBase (to > avoid excessive IO during compaction). > Keep them in a separate files in HDFS and store in HBase only references. > To overcome inherent max file number limitation of NN > you can bulk several values into a single file (you will need separate > process -M/R job to garbage collect expired or deleted items). > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: [email protected] > > ________________________________________ > From: Ted Yu [[email protected]] > Sent: Tuesday, February 25, 2014 12:02 PM > To: [email protected] > Subject: Re: Is HBase is feasible for storing 4-5 MB of data as cell value > > Minor: > Value 0 also means no cap - see HTable#validatePut() > > if (maxKeyValueSize > 0) { > > ... > > if (kv.getLength() > maxKeyValueSize) { > > throw new IllegalArgumentException("KeyValue size too large"); > > } > > > On Tue, Feb 25, 2014 at 11:52 AM, Ameya Kanitkar <[email protected]> > wrote: > > > The only other thing I'd add is, by default HBase caps size of the data > per > > column at 10 MB (I think). You can change that by changing this setting: > > > > hbase.client.keyvalue.maxsize > > in hbase-site.xml > > > > -1 means no cap. You can put other numbers for appropriate cap for your > use > > case. > > > > Ameya > > > > > > On Tue, Feb 25, 2014 at 12:12 AM, shashwat shriparv < > > [email protected]> wrote: > > > > > Yes for sure you can use hbase for this, you can have > > > 1. different fields of mail in different column of a column family and > > > attachment as a binary array also in a column. > > > 2. you can keep whole message in columns in hbase and the attachments > are > > > large enoug on the hdfs and some reference to it in hbase table. > > > 3. schema you can decide, you can have a matrix how you store values to > > > that you can decide. > > > > > > > > > *Warm Regards_**∞_* > > > * Shashwat Shriparv* > > > [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]< > > > http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image: > > > https://twitter.com/shriparv] <https://twitter.com/shriparv>[image: > > > https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv > > > >[image: > > > http://google.com/+ShashwatShriparv] > > > <http://google.com/+ShashwatShriparv>[image: > > > http://www.youtube.com/user/sShriparv/videos]< > > > http://www.youtube.com/user/sShriparv/videos>[image: > > > http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] < > > [email protected]> > > > > > > > > > > > > On Tue, Feb 25, 2014 at 12:55 PM, Upendra Yadav <[email protected] > > > >wrote: > > > > > > > I have to use hbase and have mix type of data > > > > > > > > Some of them have size 1-4K(Mail- Header....) and others > > > > >5MB(Attachments...) > > > > > > > > And also we need only random access: any data > > > > > > > > Is HBase is feasible for storing this type of data > > > > > > > > What will be my schema design - > > > > will have to go with 2 different Table -> 1st one for 1-4K and 2nd > for > > > big > > > > file > > > > (because of memstore flush will flush other CF, and huge random > access) > > > > > > > > Or there is other way:; > > > > > > > > Thanks > > > > > > > > > > > Confidentiality Notice: The information contained in this message, > including any attachments hereto, may be confidential and is intended to be > read only by the individual or entity to whom this message is addressed. If > the reader of this message is not the intended recipient or an agent or > designee of the intended recipient, please note that any review, use, > disclosure or distribution of this message or its attachments, in any form, > is strictly prohibited. If you have received this message in error, please > immediately notify the sender and/or [email protected] and > delete or destroy any copy of this message and its attachments. >
