[jira] [Commented] (HBASE-18987) Raise value of HConstants#MAX_ROW_LENGTH

2017-11-20 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259848#comment-16259848
 ] 

Esteban Gutierrez commented on HBASE-18987:
---

After some discussion offline with [~mdrob] around the same comments from 
[~anoopsamjohn] the only approach to address this correctly is by having a new 
HFileV4 format without this kind of limitations. For now I'm going to create a 
new issue to add a guard rail to avoid accepting a key near to 
{{Short.MAX_VALUE}} in order to avoid triggering this problem.

> Raise value of HConstants#MAX_ROW_LENGTH
> 
>
> Key: HBASE-18987
> URL: https://issues.apache.org/jira/browse/HBASE-18987
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Minor
> Attachments: HBASE-18987.master.001.patch, 
> HBASE-18987.master.002.patch
>
>
> Short.MAX_VALUE hasn't been a problem for a long time but one of our 
> customers ran into an  edgy case when the midKey used for the split point was 
> very close to Short.MAX_VALUE. When the split is submitted, we attempt to 
> create the new two daughter regions and we name those regions via 
> {{HRegionInfo.createRegionName()}} in order to be added to META. 
> Unfortunately, since {{HRegionInfo.createRegionName()}} uses midKey as the 
> startKey {{Put}} will fail since the row key length will now fail checkRow 
> and thus causing the split to fail.
> I tried a couple of alternatives to address this problem, e.g. truncating the 
> startKey. But the number of changes in the code doesn't justify for this edge 
> condition. Since we already use {{Integer.MAX_VALUE - 1}} for 
> {{HConstants#MAXIMUM_VALUE_LENGTH}} it should be ok to use the same limit for 
> the maximum row key. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18987) Raise value of HConstants#MAX_ROW_LENGTH

2017-10-12 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202225#comment-16202225
 ] 

Esteban Gutierrez commented on HBASE-18987:
---

[~mdrob], thats correct and there is no way to avoid that except avoiding keys 
that would have maxed out the row key length. Basically instead of checkRow() 
verifying if a row is larger than Short.MAX_VALUE we should verify if the row 
will blow the limitation of Short.MAX_VALUE + the region name overhead later on 
a split.

> Raise value of HConstants#MAX_ROW_LENGTH
> 
>
> Key: HBASE-18987
> URL: https://issues.apache.org/jira/browse/HBASE-18987
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Minor
> Attachments: HBASE-18987.master.001.patch, 
> HBASE-18987.master.002.patch
>
>
> Short.MAX_VALUE hasn't been a problem for a long time but one of our 
> customers ran into an  edgy case when the midKey used for the split point was 
> very close to Short.MAX_VALUE. When the split is submitted, we attempt to 
> create the new two daughter regions and we name those regions via 
> {{HRegionInfo.createRegionName()}} in order to be added to META. 
> Unfortunately, since {{HRegionInfo.createRegionName()}} uses midKey as the 
> startKey {{Put}} will fail since the row key length will now fail checkRow 
> and thus causing the split to fail.
> I tried a couple of alternatives to address this problem, e.g. truncating the 
> startKey. But the number of changes in the code doesn't justify for this edge 
> condition. Since we already use {{Integer.MAX_VALUE - 1}} for 
> {{HConstants#MAXIMUM_VALUE_LENGTH}} it should be ok to use the same limit for 
> the maximum row key. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18987) Raise value of HConstants#MAX_ROW_LENGTH

2017-10-12 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202106#comment-16202106
 ] 

Mike Drob commented on HBASE-18987:
---

Split length being shorter than key length means that we can potentially have 
too many rows in a single region and fail when trying to split? i.e. we decide 
that we need to split on xxxy but that truncates to xxx and xxx is already a 
split point so this turns into a no-op, but then the automatic splitting code 
kicks back in?

> Raise value of HConstants#MAX_ROW_LENGTH
> 
>
> Key: HBASE-18987
> URL: https://issues.apache.org/jira/browse/HBASE-18987
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Minor
> Attachments: HBASE-18987.master.001.patch, 
> HBASE-18987.master.002.patch
>
>
> Short.MAX_VALUE hasn't been a problem for a long time but one of our 
> customers ran into an  edgy case when the midKey used for the split point was 
> very close to Short.MAX_VALUE. When the split is submitted, we attempt to 
> create the new two daughter regions and we name those regions via 
> {{HRegionInfo.createRegionName()}} in order to be added to META. 
> Unfortunately, since {{HRegionInfo.createRegionName()}} uses midKey as the 
> startKey {{Put}} will fail since the row key length will now fail checkRow 
> and thus causing the split to fail.
> I tried a couple of alternatives to address this problem, e.g. truncating the 
> startKey. But the number of changes in the code doesn't justify for this edge 
> condition. Since we already use {{Integer.MAX_VALUE - 1}} for 
> {{HConstants#MAXIMUM_VALUE_LENGTH}} it should be ok to use the same limit for 
> the maximum row key. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18987) Raise value of HConstants#MAX_ROW_LENGTH

2017-10-12 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202080#comment-16202080
 ] 

Anoop Sam John commented on HBASE-18987:


bq. truncate approach might be the only alternative for now.
Yep.  The split point has to limit its length considering the extra bytes 
length we will take put put this in META region.

> Raise value of HConstants#MAX_ROW_LENGTH
> 
>
> Key: HBASE-18987
> URL: https://issues.apache.org/jira/browse/HBASE-18987
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Minor
> Attachments: HBASE-18987.master.001.patch, 
> HBASE-18987.master.002.patch
>
>
> Short.MAX_VALUE hasn't been a problem for a long time but one of our 
> customers ran into an  edgy case when the midKey used for the split point was 
> very close to Short.MAX_VALUE. When the split is submitted, we attempt to 
> create the new two daughter regions and we name those regions via 
> {{HRegionInfo.createRegionName()}} in order to be added to META. 
> Unfortunately, since {{HRegionInfo.createRegionName()}} uses midKey as the 
> startKey {{Put}} will fail since the row key length will now fail checkRow 
> and thus causing the split to fail.
> I tried a couple of alternatives to address this problem, e.g. truncating the 
> startKey. But the number of changes in the code doesn't justify for this edge 
> condition. Since we already use {{Integer.MAX_VALUE - 1}} for 
> {{HConstants#MAXIMUM_VALUE_LENGTH}} it should be ok to use the same limit for 
> the maximum row key. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18987) Raise value of HConstants#MAX_ROW_LENGTH

2017-10-12 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202042#comment-16202042
 ] 

Esteban Gutierrez commented on HBASE-18987:
---

bq. Not in testOversizedRegionNameForPut 
[~mdrob]: thanks!

bq. The value length can be upto Integer.MAX_VALUE - 1 as we use 4 bytes to 
store that. But for the row length it is 2 bytes right? Then allowing 
Integer.MAX_VALUE - 1 for RK length also correct?

[~anoopsamjohn]: yeah, you are right. The problem seems to run deeper: The 
KeyValue constructor accepts an integer for rlength but there are few more 
places where we only use a short: {{createEmptyByteArray}} will test if rlength 
is greater than {{Short.MAX_VALUE}} and {{rowLen}} on {{KeyOnlyKeyValue}} is a 
short. Also {{KEYVALUE_INFRASTRUCTURE_SIZE}} depends on ROW_LENGTH_SIZE which 
is {{Bytes.SIZEOF_SHORT}} My test didn't catch that since you need to go all 
the way to serialize the KV.

I think I'm -1 now for this and the truncate approach might be the only 
alternative for now.


> Raise value of HConstants#MAX_ROW_LENGTH
> 
>
> Key: HBASE-18987
> URL: https://issues.apache.org/jira/browse/HBASE-18987
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Minor
> Attachments: HBASE-18987.master.001.patch, 
> HBASE-18987.master.002.patch
>
>
> Short.MAX_VALUE hasn't been a problem for a long time but one of our 
> customers ran into an  edgy case when the midKey used for the split point was 
> very close to Short.MAX_VALUE. When the split is submitted, we attempt to 
> create the new two daughter regions and we name those regions via 
> {{HRegionInfo.createRegionName()}} in order to be added to META. 
> Unfortunately, since {{HRegionInfo.createRegionName()}} uses midKey as the 
> startKey {{Put}} will fail since the row key length will now fail checkRow 
> and thus causing the split to fail.
> I tried a couple of alternatives to address this problem, e.g. truncating the 
> startKey. But the number of changes in the code doesn't justify for this edge 
> condition. Since we already use {{Integer.MAX_VALUE - 1}} for 
> {{HConstants#MAXIMUM_VALUE_LENGTH}} it should be ok to use the same limit for 
> the maximum row key. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18987) Raise value of HConstants#MAX_ROW_LENGTH

2017-10-12 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201895#comment-16201895
 ] 

Anoop Sam John commented on HBASE-18987:


It is an int but what I mean is while writing Cells to HFile. When a Put is 
been made with RK of length more than Short.MAX_VAL (And with this patch), how 
abt the RK length what we will get back from Cells added into the Put? Will 
that be correct?  The test case just assert Put is been made with out 
Exception.  Can u do an addColumn() to this Put and get back that Cell and see 
the RK len?

> Raise value of HConstants#MAX_ROW_LENGTH
> 
>
> Key: HBASE-18987
> URL: https://issues.apache.org/jira/browse/HBASE-18987
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Minor
> Attachments: HBASE-18987.master.001.patch, 
> HBASE-18987.master.002.patch
>
>
> Short.MAX_VALUE hasn't been a problem for a long time but one of our 
> customers ran into an  edgy case when the midKey used for the split point was 
> very close to Short.MAX_VALUE. When the split is submitted, we attempt to 
> create the new two daughter regions and we name those regions via 
> {{HRegionInfo.createRegionName()}} in order to be added to META. 
> Unfortunately, since {{HRegionInfo.createRegionName()}} uses midKey as the 
> startKey {{Put}} will fail since the row key length will now fail checkRow 
> and thus causing the split to fail.
> I tried a couple of alternatives to address this problem, e.g. truncating the 
> startKey. But the number of changes in the code doesn't justify for this edge 
> condition. Since we already use {{Integer.MAX_VALUE - 1}} for 
> {{HConstants#MAXIMUM_VALUE_LENGTH}} it should be ok to use the same limit for 
> the maximum row key. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18987) Raise value of HConstants#MAX_ROW_LENGTH

2017-10-12 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201804#comment-16201804
 ] 

Mike Drob commented on HBASE-18987:
---

bq. {{nameStr}} is being used
Not in {{testOversizedRegionNameForPut}}

bq. But for the row length it is 2 bytes right
{code}
public static final int MAX_ROW_LENGTH = Short.MAX_VALUE;
{code}
max row len is an int here, I think all of the encoders under 
BufferedDataBlockEncoder write it as a vInt.

> Raise value of HConstants#MAX_ROW_LENGTH
> 
>
> Key: HBASE-18987
> URL: https://issues.apache.org/jira/browse/HBASE-18987
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Minor
> Attachments: HBASE-18987.master.001.patch, 
> HBASE-18987.master.002.patch
>
>
> Short.MAX_VALUE hasn't been a problem for a long time but one of our 
> customers ran into an  edgy case when the midKey used for the split point was 
> very close to Short.MAX_VALUE. When the split is submitted, we attempt to 
> create the new two daughter regions and we name those regions via 
> {{HRegionInfo.createRegionName()}} in order to be added to META. 
> Unfortunately, since {{HRegionInfo.createRegionName()}} uses midKey as the 
> startKey {{Put}} will fail since the row key length will now fail checkRow 
> and thus causing the split to fail.
> I tried a couple of alternatives to address this problem, e.g. truncating the 
> startKey. But the number of changes in the code doesn't justify for this edge 
> condition. Since we already use {{Integer.MAX_VALUE - 1}} for 
> {{HConstants#MAXIMUM_VALUE_LENGTH}} it should be ok to use the same limit for 
> the maximum row key. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18987) Raise value of HConstants#MAX_ROW_LENGTH

2017-10-12 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201750#comment-16201750
 ] 

Anoop Sam John commented on HBASE-18987:


The value length can be upto Integer.MAX_VALUE - 1 as we use 4 bytes to store 
that.  But for the row length it is 2 bytes right? Then allowing 
Integer.MAX_VALUE - 1 for RK length also correct?
As the createRegionName() adding some more bytes it crossed the Short MAX value 
limit.
bq. int len = tableName.getName().length + 2 + id.length +  (startKey == null? 
0: startKey.length);
I believe we should check this while creating the split point.  We should 
consider the extra bytes to be added also and should limit the split point max 
length as per that. Changing the RK max length for this case is not reasonable 
IMO

> Raise value of HConstants#MAX_ROW_LENGTH
> 
>
> Key: HBASE-18987
> URL: https://issues.apache.org/jira/browse/HBASE-18987
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Minor
> Attachments: HBASE-18987.master.001.patch, 
> HBASE-18987.master.002.patch
>
>
> Short.MAX_VALUE hasn't been a problem for a long time but one of our 
> customers ran into an  edgy case when the midKey used for the split point was 
> very close to Short.MAX_VALUE. When the split is submitted, we attempt to 
> create the new two daughter regions and we name those regions via 
> {{HRegionInfo.createRegionName()}} in order to be added to META. 
> Unfortunately, since {{HRegionInfo.createRegionName()}} uses midKey as the 
> startKey {{Put}} will fail since the row key length will now fail checkRow 
> and thus causing the split to fail.
> I tried a couple of alternatives to address this problem, e.g. truncating the 
> startKey. But the number of changes in the code doesn't justify for this edge 
> condition. Since we already use {{Integer.MAX_VALUE - 1}} for 
> {{HConstants#MAXIMUM_VALUE_LENGTH}} it should be ok to use the same limit for 
> the maximum row key. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18987) Raise value of HConstants#MAX_ROW_LENGTH

2017-10-11 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201067#comment-16201067
 ] 

Esteban Gutierrez commented on HBASE-18987:
---

[~mdrob],  {{nameStr}} is being used.
{code}
String nameStr = Bytes.toString(name);
assertTrue(nameStr.length() <= HConstants.MAX_ROW_LENGTH);
{code}


> Raise value of HConstants#MAX_ROW_LENGTH
> 
>
> Key: HBASE-18987
> URL: https://issues.apache.org/jira/browse/HBASE-18987
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Minor
> Attachments: HBASE-18987.master.001.patch, 
> HBASE-18987.master.002.patch
>
>
> Short.MAX_VALUE hasn't been a problem for a long time but one of our 
> customers ran into an  edgy case when the midKey used for the split point was 
> very close to Short.MAX_VALUE. When the split is submitted, we attempt to 
> create the new two daughter regions and we name those regions via 
> {{HRegionInfo.createRegionName()}} in order to be added to META. 
> Unfortunately, since {{HRegionInfo.createRegionName()}} uses midKey as the 
> startKey {{Put}} will fail since the row key length will now fail checkRow 
> and thus causing the split to fail.
> I tried a couple of alternatives to address this problem, e.g. truncating the 
> startKey. But the number of changes in the code doesn't justify for this edge 
> condition. Since we already use {{Integer.MAX_VALUE - 1}} for 
> {{HConstants#MAXIMUM_VALUE_LENGTH}} it should be ok to use the same limit for 
> the maximum row key. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18987) Raise value of HConstants#MAX_ROW_LENGTH

2017-10-11 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200932#comment-16200932
 ] 

Mike Drob commented on HBASE-18987:
---

{code}
+String nameStr = Bytes.toString(name);
{code}
This one is unused too, I should have caught it the first time.

+1 assuming tests pass otherwise

> Raise value of HConstants#MAX_ROW_LENGTH
> 
>
> Key: HBASE-18987
> URL: https://issues.apache.org/jira/browse/HBASE-18987
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Minor
> Attachments: HBASE-18987.master.001.patch, 
> HBASE-18987.master.002.patch
>
>
> Short.MAX_VALUE hasn't been a problem for a long time but one of our 
> customers ran into an  edgy case when the midKey used for the split point was 
> very close to Short.MAX_VALUE. When the split is submitted, we attempt to 
> create the new two daughter regions and we name those regions via 
> {{HRegionInfo.createRegionName()}} in order to be added to META. 
> Unfortunately, since {{HRegionInfo.createRegionName()}} uses midKey as the 
> startKey {{Put}} will fail since the row key length will now fail checkRow 
> and thus causing the split to fail.
> I tried a couple of alternatives to address this problem, e.g. truncating the 
> startKey. But the number of changes in the code doesn't justify for this edge 
> condition. Since we already use {{Integer.MAX_VALUE - 1}} for 
> {{HConstants#MAXIMUM_VALUE_LENGTH}} it should be ok to use the same limit for 
> the maximum row key. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18987) Raise value of HConstants#MAX_ROW_LENGTH

2017-10-11 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200877#comment-16200877
 ] 

Mike Drob commented on HBASE-18987:
---

{code}
+String md5HashInHex = MD5Hash.getMD5AsHex(name, 0, name.length);
{code}
This is unused?

{code}
+try {
+  Put hri_a = new Put(sk);
+} catch (Exception e) {
+  fail("Put shouldn't have failed with oversized row key");
+}
{code}
Let the exception propagate, or do {{assertNotNull(hri_a)}}. The catch/fail is 
icky, I think.

> Raise value of HConstants#MAX_ROW_LENGTH
> 
>
> Key: HBASE-18987
> URL: https://issues.apache.org/jira/browse/HBASE-18987
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Minor
> Attachments: HBASE-18987.master.001.patch
>
>
> Short.MAX_VALUE hasn't been a problem for a long time but one of our 
> customers ran into an  edgy case when the midKey used for the split point was 
> very close to Short.MAX_VALUE. When the split is submitted, we attempt to 
> create the new two daughter regions and we name those regions via 
> {{HRegionInfo.createRegionName()}} in order to be added to META. 
> Unfortunately, since {{HRegionInfo.createRegionName()}} uses midKey as the 
> startKey {{Put}} will fail since the row key length will now fail checkRow 
> and thus causing the split to fail.
> I tried a couple of alternatives to address this problem, e.g. truncating the 
> startKey. But the number of changes in the code doesn't justify for this edge 
> condition. Since we already use {{Integer.MAX_VALUE - 1}} for 
> {{HConstants#MAXIMUM_VALUE_LENGTH}} it should be ok to use the same limit for 
> the maximum row key. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)