Hi Ryan,
I put a
e.printStackTrace()
in the catch clause of the method where the error occurs. Here is what I get
reported back:
e = 'java.lang.IllegalArgumentException: KeyValue size too large'
java.lang.IllegalArgumentException: KeyValue size too large
at org.apache.hadoop.hbase.client.HTable.validatePut(HTable.java:688)
at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:544)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:535)
at
putPeptideDataIntoHBase.write_rnaSeqCountData_Into_rnaSeqCountTable(putPeptideDataIntoHBase.java:6235)
at
putPeptideDataIntoHBase.inputRNAseqCountData(putPeptideDataIntoHBase.java:5773)
at putPeptideDataIntoHBase.invoke(putPeptideDataIntoHBase.java:1000)
at putPeptideDataIntoHBase.main(putPeptideDataIntoHBase.java:765)
So - looks like that the error is occurring in the validatePut method.
I am using Hbase ver 0.89.20100726 on a 25-node cluster running Hadoop 0.20.2.
OS info is as follows:
[rtay...@h01 hbase]$ uname -a
Linux h01.emsl.pnl.gov 2.6.18-194.11.1.el5 #1 SMP Tue Jul 27 05:45:06 EDT 2010
x86_64 x86_64 x86_64 GNU/Linux
[rtay...@h01 hbase]$
[rtay...@h01 hbase]$ more /etc/redhat-release
Red Hat Enterprise Linux Client release 5.5 (Tikanga)
[rtay...@h01 hbase]$
Also: I just tried cutting the string down to 8.0 Meg instad of down to 5.0
Meg. That also works. So - the problem starts to occur somewhere between a
string size of 8 and ~12.5 Meg.
Ron
___________________________________________
Ronald Taylor, Ph.D.
Computational Biology & Bioinformatics Group
Pacific Northwest National Laboratory
902 Battelle Boulevard
P.O. Box 999, Mail Stop J4-33
Richland, WA 99352 USA
Office: 509-372-6568
Email: [email protected]
-----Original Message-----
From: Ryan Rawson [mailto:[email protected]]
Sent: Sunday, October 03, 2010 12:23 AM
To: Taylor, Ronald C
Cc: [email protected]; [email protected]
Subject: Re: How do you increase the max cell size in Hbase?
What version of HBase are you using? Looking in the HBase source code, in the
likely place (KeyValue.createByteArray), the exceptions look more like yay so:
throw new IllegalArgumentException("Row > " + Short.MAX_VALUE);
Perhaps you could identify the call stack and log that? That would help a lot,
also the message suggests it might not come from HBase code...
-ryan
On Sat, Oct 2, 2010 at 11:35 PM, Taylor, Ronald C <[email protected]> wrote:
>
> FYI - there is nothing in the string itself that would cause an error. It's
> just a concatenated list of integer values, with colon separators between the
> numbers. I've checked it visually - and, yep, the string is indeed a simple
> ASCII list of integers with colon delimiters. Nothing weird in it. So the
> problem is something in regard to the length - not the string contents.
>
> Ron
>
> -----Original Message-----
> From: Taylor, Ronald C
> Sent: Saturday, October 02, 2010 11:25 PM
> To: 'Ryan Rawson'; [email protected]
> Cc: Taylor, Ronald C
> Subject: RE: How do you increase the max cell size in Hbase?
>
>
> Ryan,
>
> I just tried chopping the string to a max of 5 Meg, down from about 12 Meg at
> its largest, and the insertions appear to work fine. I do a scan afterwards
> and the fields and their contents appear to be all there. So - it would
> appear that I'm violating *some* length limit, somewhere, when I use the full
> 12 Meg string.
>
> Here's the relevant code for the version that worked, with the replacement of
> the full string with the 5 Meg string instead:
>
> rowID = CURRENT_SPECIES_ID_ABBREV + "_" + RNASEQ_RUNID +
> "_chromo_pos_strand";
> p = new Put(Bytes.toBytes(rowID));
>
> chromo_pos_strand_counts = buf.toString();
>
> System.out.println("size of chromo_pos_strand_counts = " +
> chromo_pos_strand_counts.length());
> System.out.println("writing chromo positive strand data to
> table ...\n\n");
>
> temp = chromo_pos_strand_counts.substring(0,5000000);
> // p.add(Bytes.toBytes(colFamily),
> //
> Bytes.toBytes("Chromo_Positive_Strand_rnaSeq_Counts"),Bytes.toBytes(ch
> romo_pos_strand_counts));
>
> p.add(Bytes.toBytes(colFamily),Bytes.toBytes("Chromo_Positive_Strand_r
> naSeq_Counts"),Bytes.toBytes(temp));
>
>
> p.add(Bytes.toBytes(colFamily),Bytes.toBytes("Strand"),Bytes.toBytes("
> positive"));
>
> p.add(Bytes.toBytes(colFamily),Bytes.toBytes("Source"),Bytes.toBytes("
> chromosome"));
>
> p.add(Bytes.toBytes(colFamily),Bytes.toBytes("Species"),Bytes.toBytes(
> CURRENT_SPECIES_ID));
>
> p.add(Bytes.toBytes(colFamily),Bytes.toBytes("SpeciesAbbrev"),Bytes.to
> Bytes(CURRENT_SPECIES_ID_ABBREV));
>
> p.add(Bytes.toBytes(colFamily),Bytes.toBytes("RunID"),Bytes.toBytes(RN
> ASEQ_RUNID));
> rnaSeqCountTable.put(p);
>
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
> Going back to using the full string, I get this error msg:
>
> size of chromo_pos_strand_counts = 12897748 writing chromo positive strand
> data to table ...
>
> Exception java.lang.IllegalArgumentException: KeyValue size too large
> in write_rnaSeqCountData_Into_rnaSeqCountTable()
>
> e = 'java.lang.IllegalArgumentException: KeyValue size too large'
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
> Any thoughts?
>
> Ron
>
> ___________________________________________
> Ronald Taylor, Ph.D.
> Computational Biology & Bioinformatics Group Pacific Northwest
> National Laboratory
> 902 Battelle Boulevard
> P.O. Box 999, Mail Stop J4-33
> Richland, WA 99352 USA
> Office: 509-372-6568
> Email: [email protected]
>
> -----Original Message-----
> From: Ryan Rawson [mailto:[email protected]]
> Sent: Saturday, October 02, 2010 11:07 PM
> To: Taylor, Ronald C
> Cc: [email protected]
> Subject: Re: How do you increase the max cell size in Hbase?
>
> Hey,
>
> The max sizes of these things are determined by how many bits we use to
> specify lengths in the file format. Changing it isn't an option because the
> file format depends on it and done by code.
>
> However you shouldn't be hitting limits based on the data you told us...
> perhaps if you could paste the exception backtrace it might indicate
> something useful.
>
> -ryan
>
> On Sat, Oct 2, 2010 at 10:56 PM, Taylor, Ronald C <[email protected]>
> wrote:
>>
>> Hi Ryan,
>>
>> Well, the key is only 50 chars or so, and the data in the cell is
>> about 12 Meg, in one string. So - obviously this is way less than the
>> 2 Gb limit for each. Does this have anything to do with the row
>> length, which is supposed to be kept to less than, from what you say
>> below,
>> Short.MAX_LENGTH ?
>>
>> I do not see MAX_LENGTH being set anywhere in the conf files (just did a
>> grep), so - what is the default row max length, and where would I reset it,
>> if needed?
>>
>> And if that's *not* the problem, any other possibilities for an error
>> msg of
>> KeyValue size too large
>>
>> being given on a rather prosaic Put insertion?
>>
>> Ron
>>
>> -----Original Message-----
>> From: Ryan Rawson [mailto:[email protected]]
>> Sent: Saturday, October 02, 2010 10:42 PM
>> To: [email protected]
>> Cc: Taylor, Ronald C
>> Subject: Re: How do you increase the max cell size in Hbase?
>>
>> Hey,
>>
>> The limits are due to code/data limits, eg: how many bits of space we use to
>> indicate lengths and such. This is something like ~ 2gb for the "key" part
>> and the "value" part each. Furthermore the row can only be Short.MAX_LENGTH.
>>
>> There is specific exceptions for each one of these in trunk/0.89, do you
>> have a specific exception text?
>>
>> -ryan
>>
>> On Sat, Oct 2, 2010 at 10:31 PM, Taylor, Ronald C <[email protected]>
>> wrote:
>>>
>>> Hello,
>>>
>>> I would like to increase the max cell size in one of my Hbase tables.
>>> Just got an error msg when trying to insert something about 12 Meg
>>> in size that said
>>>
>>> KeyValue size too large
>>>
>>> I presume that I'm using the default cell max size at present - I cannot
>>> find anything regarding cell size in the conf files, so the default setting
>>> must be being used.
>>>
>>> How do I increase the max size allowed? For example, if I want to allow a
>>> string of up to 20 Meg in size, which conf file do I change, and what is
>>> the precise wording?
>>>
>>> I googled around and found a note on MAX_LENGTH, but it is unclear
>>> how to set it and where. Do I do something like
>>>
>>> MAX_LENGTH=20
>>>
>>> in the conf//hbase-env.sh file?
>>>
>>> And, if that works, do I need to restart Hbase and then recreate all the
>>> tables in which I want to use the new max cell size?
>>>
>>> Cheers,
>>> Ron
>>>
>>> ___________________________________________
>>> Ronald Taylor, Ph.D.
>>> Computational Biology & Bioinformatics Group Pacific Northwest
>>> National Laboratory
>>> 902 Battelle Boulevard
>>> P.O. Box 999, Mail Stop J4-33
>>> Richland, WA 99352 USA
>>> Office: 509-372-6568
>>> Email: [email protected]
>>>
>>>
>>>
>>
>