[jira] [Reopened] (HBASE-23595) HMaster abort when write to meta failed

2021-05-14 Thread Esteban Gutierrez (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez reopened HBASE-23595:
---

> HMaster abort when write to meta failed
> ---
>
> Key: HBASE-23595
> URL: https://issues.apache.org/jira/browse/HBASE-23595
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.2
>Reporter: Lijin Bin
>Priority: Major
>
> RegionStateStore
> {code}
>   private void updateRegionLocation(RegionInfo regionInfo, State state, Put 
> put)
>   throws IOException {
> try (Table table = 
> master.getConnection().getTable(TableName.META_TABLE_NAME)) {
>   table.put(put);
> } catch (IOException e) {
>   // TODO: Revist Means that if a server is loaded, then we will 
> abort our host!
>   // In tests we abort the Master!
>   String msg = String.format("FAILED persisting region=%s state=%s",
> regionInfo.getShortNameToLog(), state);
>   LOG.error(msg, e);
>   master.abort(msg, e);
>   throw e;
> }
>   }
> {code}
> When regionserver (carry meta) stop or crash, if the ServerCrashProcedure 
> have not start process, write to meta will fail and abort master.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23595) HMaster abort when write to meta failed

2021-05-14 Thread Esteban Gutierrez (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344694#comment-17344694
 ] 

Esteban Gutierrez commented on HBASE-23595:
---

I've run into this issue a couple of times over the years and I think we should 
do better. We recently experienced this failure with a half functional host 
that didn't trigger the znode expiration but caused meta RS to be unresponsive 
and eventually crashing all our Masters after a cascade of failovers. What I've 
been thinking is that we should carry fail-fast configs to the connection to 
meta, and introspect that exception from that Put and add some logic to try to 
re-locate Meta by initiating the expiration of the RS serving meta. 

> HMaster abort when write to meta failed
> ---
>
> Key: HBASE-23595
> URL: https://issues.apache.org/jira/browse/HBASE-23595
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.2
>Reporter: Lijin Bin
>Priority: Major
>
> RegionStateStore
> {code}
>   private void updateRegionLocation(RegionInfo regionInfo, State state, Put 
> put)
>   throws IOException {
> try (Table table = 
> master.getConnection().getTable(TableName.META_TABLE_NAME)) {
>   table.put(put);
> } catch (IOException e) {
>   // TODO: Revist Means that if a server is loaded, then we will 
> abort our host!
>   // In tests we abort the Master!
>   String msg = String.format("FAILED persisting region=%s state=%s",
> regionInfo.getShortNameToLog(), state);
>   LOG.error(msg, e);
>   master.abort(msg, e);
>   throw e;
> }
>   }
> {code}
> When regionserver (carry meta) stop or crash, if the ServerCrashProcedure 
> have not start process, write to meta will fail and abort master.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25549) A new hbase shell command: 'alter_lazy'

2021-02-04 Thread Esteban Gutierrez (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17279002#comment-17279002
 ] 

Esteban Gutierrez commented on HBASE-25549:
---

Can this be a modifier of the existing {{alter}} command instead? I still have 
some doubts about any potential impact of this feature when altering properties 
such as codecs, replication or adding a new CF and the mis-use of a feature 
like this could cause.


> A new hbase shell command: 'alter_lazy'
> ---
>
> Key: HBASE-25549
> URL: https://issues.apache.org/jira/browse/HBASE-25549
> Project: HBase
>  Issue Type: Improvement
>  Components: master, shell
>Affects Versions: 3.0.0-alpha-1
>Reporter: Zhuoyue Huang
>Assignee: Zhuoyue Huang
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> Under normal circumstances, modifying a table will cause all regions 
> belonging to the table to enter RIT. Imagine the following two scenarios:
>  # Someone entered the wrong configuration (e.g. negative 
> 'hbase.busy.wait.multiplier.max' value) when altering the table, causing 
> thousands of online regions to fail to open, leading to online accidents.
>  # Modify the configuration of a table, but this modification is not urgent, 
> the regions are not expected to enter RIT immediately.
> 'alter_lazy' is a new command to modify a table without reopening any online 
> regions except those regions were assigned by other threads or split etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-19352) Port HADOOP-10379: Protect authentication cookies with the HttpOnly and Secure flags

2020-09-03 Thread Esteban Gutierrez (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-19352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-19352.
---
Fix Version/s: 2.2.6
   2.4.0
   2.3.3
   3.0.0-alpha-1
 Tags: security
   Resolution: Fixed

> Port HADOOP-10379: Protect authentication cookies with the HttpOnly and 
> Secure flags
> 
>
> Key: HBASE-19352
> URL: https://issues.apache.org/jira/browse/HBASE-19352
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0, 2.2.6
>
> Attachments: HBASE-19352.master.v0.patch
>
>
> This came via a security scanner, since we have a fork of HttpServer2 in 
> HBase we should include it too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HBASE-19352) Port HADOOP-10379: Protect authentication cookies with the HttpOnly and Secure flags

2020-09-03 Thread Esteban Gutierrez (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-19352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-19352 started by Esteban Gutierrez.
-
> Port HADOOP-10379: Protect authentication cookies with the HttpOnly and 
> Secure flags
> 
>
> Key: HBASE-19352
> URL: https://issues.apache.org/jira/browse/HBASE-19352
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Major
> Attachments: HBASE-19352.master.v0.patch
>
>
> This came via a security scanner, since we have a fork of HttpServer2 in 
> HBase we should include it too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24041) [regression] Increase RESTServer buffer size back to 64k

2020-03-27 Thread Esteban Gutierrez (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-24041.
---
Fix Version/s: 2.2.5
   2.4.0
   2.3.0
   3.0.0
   Resolution: Fixed

> [regression]  Increase RESTServer buffer size back to 64k
> -
>
> Key: HBASE-24041
> URL: https://issues.apache.org/jira/browse/HBASE-24041
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Affects Versions: 3.0.0, 2.2.0, 2.3.0, 2.4.0
>Reporter: Esteban Gutierrez
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.4.0, 2.2.5
>
>
> HBASE-14492 is not longer present in our current releases after HBASE-12894. 
> Unfortunately our RESTServer is not extending HttpServer which means that 
> {{DEFAULT_MAX_HEADER_SIZE}} is not being set and HTTP requests with a very 
> large header can still cause connection issues for clients. A quick fix is 
> just to add the settings to the {{HttpConfiguration}} configuration object. A 
> long term solution should be to re-factor services that create an HTTP server 
> and normalize all configuration settings across all of them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24041) [regression] Increase RESTServer buffer size back to 64k

2020-03-24 Thread Esteban Gutierrez (Jira)
Esteban Gutierrez created HBASE-24041:
-

 Summary: [regression]  Increase RESTServer buffer size back to 64k
 Key: HBASE-24041
 URL: https://issues.apache.org/jira/browse/HBASE-24041
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.2.0, 3.0.0, 2.3.0, 2.4.0
Reporter: Esteban Gutierrez


HBASE-14492 is not longer present in our current releases after HBASE-12894. 
Unfortunately our RESTServer is not extending HttpServer which means that 
{{DEFAULT_MAX_HEADER_SIZE}} is not being set and HTTP requests with a very 
large header can still cause connection issues for clients. A quick fix is just 
to add the settings to the {{HttpConfiguration}} configuration object. A long 
term solution should be to re-factor services that create an HTTP server and 
normalize all configuration settings across all of them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-22947) Client Should Prompt For Additional Confirmation on System Table DDL

2019-08-29 Thread Esteban Gutierrez (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918747#comment-16918747
 ] 

Esteban Gutierrez commented on HBASE-22947:
---

[~belugabehr], can you please elaborate more on this feature? We strictly 
disallow to disable the {{hbase:meta}} table:

org.apache.hadoop.hbase.master.procedure.DisableTableProcedure#prepareDisable:
{code}
 private boolean prepareDisable(final MasterProcedureEnv env) throws 
IOException {
boolean canTableBeDisabled = true;
if (tableName.equals(TableName.META_TABLE_NAME)) {
  setFailure("master-disable-table", new ConstraintException("Cannot 
disable catalog table"));
  canTableBeDisabled = false;
{code}

Probably we should add {{hbase:namespace}} there too.



> Client Should Prompt For Additional Confirmation on System Table DDL
> 
>
> Key: HBASE-22947
> URL: https://issues.apache.org/jira/browse/HBASE-22947
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 2.1.0, 2.0.0
>Reporter: David Mollitor
>Priority: Minor
>
> If a user is going to perform a DDL/disable operation on a system table, the 
> client should print a warning to the screen warning of the risks and prompt 
> something like "Are you sure?"



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (HBASE-22926) REST server should return 504 Gateway Timeout Error on scanner timeout

2019-08-26 Thread Esteban Gutierrez (Jira)
Esteban Gutierrez created HBASE-22926:
-

 Summary: REST server should return 504 Gateway Timeout Error on 
scanner timeout
 Key: HBASE-22926
 URL: https://issues.apache.org/jira/browse/HBASE-22926
 Project: HBase
  Issue Type: Bug
  Components: REST
Affects Versions: 2.2.0, 2.1.0, 3.0.0
Reporter: Esteban Gutierrez
Assignee: Esteban Gutierrez


Currently when a scanner timeout error occurs on the RS side, a client will get 
a RetriesExhaustedException that will make the client to fail, however from the 
REST server point of view that is just an IOE:

org.apache.hadoop.hbase.rest.ScannerResultGenerator#next
{code}
} else {
Result result = null;
try {
  result = scanner.next();
} catch (UnknownScannerException e) {
  throw new IllegalArgumentException(e);
} catch (TableNotEnabledException tnee) {
  throw new IllegalStateException(tnee);
} catch (TableNotFoundException tnfe) {
  throw new IllegalArgumentException(tnfe);
} catch (IOException e) {
  LOG.error(StringUtils.stringifyException(e));
}
{code}

Now, with that empty result (will handle this as an HTTP 204 response back to 
the client:

org.apache.hadoop.hbase.rest.ScannerInstanceResource#get
{code}
...
  Cell value = null;
  try {
value = generator.next();
  } catch (IllegalStateException e) {
...
  } catch (IllegalArgumentException e) {
...
  }
...
if (value == null) {
if (LOG.isTraceEnabled()) {
  LOG.trace("generator exhausted");
}
// respond with 204 (No Content) if an empty cell set would be
// returned
if (count == limit) {
  return Response.noContent().build();
}
break;
{code}

Obviously this is wrong, since a RetriesExhaustedException is most likely due a 
failure in the RS side. The correct behavior should be a 504 Gateway Timeout 
Error.






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-17295) The namespace table has two regions

2019-08-15 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908244#comment-16908244
 ] 

Esteban Gutierrez commented on HBASE-17295:
---

We run into something very similar recently with one of our users. 
[~baibaichen], even it has been a long time, can you confirm if the issue was 
caused by hbck? Thanks.

> The namespace table has two regions
> ---
>
> Key: HBASE-17295
> URL: https://issues.apache.org/jira/browse/HBASE-17295
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Chang chen
>Priority: Major
> Attachments: bug.PNG
>
>
> From the codes, hbase namespace meta table should not allowed to be split.
> {code:title=HRegion#checkSplit}
> public byte[] checkSplit() {
> // Can't split META
> if (this.getRegionInfo().isMetaTable() ||
> 
> TableName.NAMESPACE_TABLE_NAME.equals(this.getRegionInfo().getTable())) {
>   if (shouldForceSplit()) {
> LOG.warn("Cannot split meta region in HBase 0.20 and above");
>   }
>   return null;
> }
> //.
> }
> {code}
> But recently,  I see two namespace regions  in our production deployment. It 
> may be cased by restarting when cluster is in certain state.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HBASE-22286) License handling incorrectly lists CDDL/GPLv2+CE as safe to not aggregate

2019-04-22 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823342#comment-16823342
 ] 

Esteban Gutierrez commented on HBASE-22286:
---

+1

> License handling incorrectly lists CDDL/GPLv2+CE as safe to not aggregate
> -
>
> Key: HBASE-22286
> URL: https://issues.apache.org/jira/browse/HBASE-22286
> Project: HBase
>  Issue Type: Bug
>  Components: build, community
>Affects Versions: 3.0.0, 2.3.0, 2.1.5, 2.2.1
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
> Attachments: HBASE-22286.0.patch
>
>
> The template LICENSE/NOTICE stuff currently has cddl/gplv2+ce listed as an 
> acceptable license for dependencies for individual listing.
> LICENSE.vm
> {code}
> ## Whitelist of lower-case licenses that it's safe to not aggregate as above.
> ## Note that this doesn't include ALv2 or the aforementioned aggregate
> ## license mentions.
> ##
> ## See this FAQ link for justifications: 
> https://www.apache.org/legal/resolved.html
> ##
> ## NB: This list is later compared as lower-case. New entries must also be 
> all lower-case
> #set($non_aggregate_fine = [ 'public domain', 'new bsd license', 'bsd 
> license', 'bsd', 'bsd 2-clause license', 'mozilla public license version 
> 1.1', 'mozilla public license version 2.0', 'creative commons attribution 
> license, version 2.5', 'cddl/gplv2+ce' ])
> {code}
> This is not correct. We have to expressly say we're using the CDDL license 
> for those works because we can't provide downstream with the option of 
> GPLv2+CE. Also we have aggregate licensing handling for CDDL licensed works 
> and this is making us miss times when dependencies are supposed to show up 
> under one of them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-22263) Master creates duplicate ServerCrashProcedure on initialization, leading to assignment hanging in region-dense clusters

2019-04-17 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820478#comment-16820478
 ] 

Esteban Gutierrez edited comment on HBASE-22263 at 4/17/19 8:27 PM:


[~apurtell]:
bq. What we did to recover in our case was set the namespace init timeout very 
high, removed the master proc wal, and then brought up a master and waited 
until it cleared things out and came up.
Thats exactly the same approach I tried, but due the urgency of getting the 
cluster online and seeing the region assignment stagnate, we had to go via a 
different route, e.g. failing over multiple times. 


was (Author: esteban):
bq. What we did to recover in our case was set the namespace init timeout very 
high, removed the master proc wal, and then brought up a master and waited 
until it cleared things out and came up.

> Master creates duplicate ServerCrashProcedure on initialization, leading to 
> assignment hanging in region-dense clusters
> ---
>
> Key: HBASE-22263
> URL: https://issues.apache.org/jira/browse/HBASE-22263
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
>
> h3. Problem:
> During Master initialization we
>  # restore existing procedures that still need to run from prior active 
> Master instances
>  # look for signs that Region Servers have died and need to be recovered 
> while we were out and schedule a ServerCrashProcedure (SCP) for each them
>  # turn on the assignment manager
> The normal turn of events for a ServerCrashProcedure will attempt to use a 
> bulk assignment to maintain the set of regions on a RS if possible. However, 
> we wait around and retry a bit later if the assignment manager isn’t ready 
> yet.
> Note that currently #2 has no notion of wether or not a previous active 
> Master instances has already done a check. This means we might schedule an 
> SCP for a ServerName (host, port, start code) that already has an SCP 
> scheduled. Ideally, such a duplicate should be a no-op.
> However, before step #2 schedules the SCP it first marks the region server as 
> dead and not yet processed, with the expectation that the SCP it just created 
> will look if there is log splitting work and then mark the server as easy for 
> region assignment. At the same time, any restored SCPs that are past the step 
> of log splitting will be waiting for the AssignmentManager still. As a part 
> of restoring themselves, they do not update with the current master instance 
> to show that they are past the point of WAL processing.
> Once the AssignmentManager starts in #3 the restored SCP continues; it will 
> eventually get to the assignment phase and find that its server is marked as 
> dead and in need of wal processing. Such assignments are skipped with a log 
> message. Thus as we iterate over the regions to assign we’ll skip all of 
> them. This non-intuitively shifts the “no-op” status from the newer SCP we 
> scheduled at #2 to the older SCP that was restored in #1.
> Bulk assignment works by sending the assign calls via a pool to allow more 
> parallelism. Once we’ve set up the pool we just wait to see if the region 
> state updates to online. Unfortunately, since all of the assigns got skipped, 
> we’ll never change the state for any of these regions. That means the bulk 
> assign, and the older SCP that started it, will wait until it hits a timeout.
> By default the timeout for a bulk assignment is the smaller of {{(# Regions 
> in the plan * 10s)}} or {{(# Regions in the most loaded RS in the plan * 1s + 
> 60s + # of RegionServers in the cluster * 30s)}}. For even modest clusters 
> with several hundreds of regions per region server, this means the “no-op” 
> SCP will end up waiting ~tens-of-minutes (e.g. ~50 minutes for an average 
> region density of 300 regions per region server on a 100 node cluster. ~11 
> minutes for 300 regions per region server on a 10 node cluster). During this 
> time, the SCP will hold one of the available procedure execution slots for 
> both the overall pool and for the specific server queue.
> As previously mentioned, restored SCPs will retry their submission if the 
> assignment manager has not yet been activated (done in #3), this can cause 
> them to be scheduled after the newer SCPs (created in #2). Thus the order of 
> execution of no-op and usable SCPs can vary from run-to-run of master 
> initialization.
> This means that unless you get lucky with SCP ordering, impacted regions will 
> remain as RIT for an extended period of time. If you get particularly unlucky 
> and a critical 

[jira] [Commented] (HBASE-22263) Master creates duplicate ServerCrashProcedure on initialization, leading to assignment hanging in region-dense clusters

2019-04-17 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820478#comment-16820478
 ] 

Esteban Gutierrez commented on HBASE-22263:
---

bq. What we did to recover in our case was set the namespace init timeout very 
high, removed the master proc wal, and then brought up a master and waited 
until it cleared things out and came up.

> Master creates duplicate ServerCrashProcedure on initialization, leading to 
> assignment hanging in region-dense clusters
> ---
>
> Key: HBASE-22263
> URL: https://issues.apache.org/jira/browse/HBASE-22263
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
>
> h3. Problem:
> During Master initialization we
>  # restore existing procedures that still need to run from prior active 
> Master instances
>  # look for signs that Region Servers have died and need to be recovered 
> while we were out and schedule a ServerCrashProcedure (SCP) for each them
>  # turn on the assignment manager
> The normal turn of events for a ServerCrashProcedure will attempt to use a 
> bulk assignment to maintain the set of regions on a RS if possible. However, 
> we wait around and retry a bit later if the assignment manager isn’t ready 
> yet.
> Note that currently #2 has no notion of wether or not a previous active 
> Master instances has already done a check. This means we might schedule an 
> SCP for a ServerName (host, port, start code) that already has an SCP 
> scheduled. Ideally, such a duplicate should be a no-op.
> However, before step #2 schedules the SCP it first marks the region server as 
> dead and not yet processed, with the expectation that the SCP it just created 
> will look if there is log splitting work and then mark the server as easy for 
> region assignment. At the same time, any restored SCPs that are past the step 
> of log splitting will be waiting for the AssignmentManager still. As a part 
> of restoring themselves, they do not update with the current master instance 
> to show that they are past the point of WAL processing.
> Once the AssignmentManager starts in #3 the restored SCP continues; it will 
> eventually get to the assignment phase and find that its server is marked as 
> dead and in need of wal processing. Such assignments are skipped with a log 
> message. Thus as we iterate over the regions to assign we’ll skip all of 
> them. This non-intuitively shifts the “no-op” status from the newer SCP we 
> scheduled at #2 to the older SCP that was restored in #1.
> Bulk assignment works by sending the assign calls via a pool to allow more 
> parallelism. Once we’ve set up the pool we just wait to see if the region 
> state updates to online. Unfortunately, since all of the assigns got skipped, 
> we’ll never change the state for any of these regions. That means the bulk 
> assign, and the older SCP that started it, will wait until it hits a timeout.
> By default the timeout for a bulk assignment is the smaller of {{(# Regions 
> in the plan * 10s)}} or {{(# Regions in the most loaded RS in the plan * 1s + 
> 60s + # of RegionServers in the cluster * 30s)}}. For even modest clusters 
> with several hundreds of regions per region server, this means the “no-op” 
> SCP will end up waiting ~tens-of-minutes (e.g. ~50 minutes for an average 
> region density of 300 regions per region server on a 100 node cluster. ~11 
> minutes for 300 regions per region server on a 10 node cluster). During this 
> time, the SCP will hold one of the available procedure execution slots for 
> both the overall pool and for the specific server queue.
> As previously mentioned, restored SCPs will retry their submission if the 
> assignment manager has not yet been activated (done in #3), this can cause 
> them to be scheduled after the newer SCPs (created in #2). Thus the order of 
> execution of no-op and usable SCPs can vary from run-to-run of master 
> initialization.
> This means that unless you get lucky with SCP ordering, impacted regions will 
> remain as RIT for an extended period of time. If you get particularly unlucky 
> and a critical system table is included in the regions that are being 
> recovered, then master initialization itself will end up blocked on this 
> sequence of SCP timeouts. If there are enough of them to exceed the master 
> initialization timeouts, then the situation can be self-sustaining as 
> additional master fails over cause even more duplicative SCPs to be scheduled.
> h3. Indicators:
>  * Master appears to hang; failing to assign regions to available region 
> servers.
>  * Master appears to hang 

[jira] [Commented] (HBASE-22253) An AuthenticationTokenSecretManager leader won't step down if another RS claims to be a leader

2019-04-16 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819413#comment-16819413
 ] 

Esteban Gutierrez commented on HBASE-22253:
---

bq. related: if we are leader and the leader znode is deleted we should step 
down
Yeah, probably we should make sure that the session timeout for the keymaster 
znode is shorter than the sleep interval for the LeaderElector.

> An AuthenticationTokenSecretManager leader won't step down if another RS 
> claims to be a leader
> --
>
> Key: HBASE-22253
> URL: https://issues.apache.org/jira/browse/HBASE-22253
> Project: HBase
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Critical
>
> We ran into a situation were a rogue Lily HBase Indexer [SEP 
> Consumer|https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl/src/main/java/com/ngdata/sep/impl/SepConsumer.java#L169]
>  sharing the same {{zookeeper.znode.parent}} claimed to be 
> AuthenticationTokenSecretManager for an HBase cluster. This situation 
> undesirable since the leader running on the HBase cluster doesn't steps down 
> when the rogue leader registers in the HBase cluster and both will start 
> rolling keys with the same IDs causing authentication errors. Even a 
> reasonable "fix" is to point to a different {{zookeeper.znode.parent}}, we 
> should make sure that we step down as leader correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-22253) An AuthenticationTokenSecretManager leader won't step down if another RS claims to be a leader

2019-04-16 Thread Esteban Gutierrez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez reassigned HBASE-22253:
-

Assignee: Esteban Gutierrez

> An AuthenticationTokenSecretManager leader won't step down if another RS 
> claims to be a leader
> --
>
> Key: HBASE-22253
> URL: https://issues.apache.org/jira/browse/HBASE-22253
> Project: HBase
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Critical
>
> We ran into a situation were a rogue Lily HBase Indexer [SEP 
> Consumer|https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl/src/main/java/com/ngdata/sep/impl/SepConsumer.java#L169]
>  sharing the same {{zookeeper.znode.parent}} claimed to be 
> AuthenticationTokenSecretManager for an HBase cluster. This situation 
> undesirable since the leader running on the HBase cluster doesn't steps down 
> when the rogue leader registers in the HBase cluster and both will start 
> rolling keys with the same IDs causing authentication errors. Even a 
> reasonable "fix" is to point to a different {{zookeeper.znode.parent}}, we 
> should make sure that we step down as leader correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22253) An AuthenticationTokenSecretManager leader won't step down if another RS claims to be a leader

2019-04-16 Thread Esteban Gutierrez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-22253:
--
Description: 
We ran into a situation were a rogue Lily HBase Indexer [SEP 
Consumer|https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl/src/main/java/com/ngdata/sep/impl/SepConsumer.java#L169]
 sharing the same {{zookeeper.znode.parent}} claimed to be 
AuthenticationTokenSecretManager for an HBase cluster. This situation 
undesirable since the leader running on the HBase cluster doesn't steps down 
when the rogue leader registers in the HBase cluster and both will start 
rolling keys with the same IDs causing authentication errors. Even a reasonable 
"fix" is to point to a different {{zookeeper.znode.parent}}, we should make 
sure that we step down as leader correctly.


  was:
We ran into a situation were a rouge Lily HBase Indexer [SEP 
Consumer|https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl/src/main/java/com/ngdata/sep/impl/SepConsumer.java#L169]
 sharing the same {{zookeeper.znode.parent}} claimed to be 
AuthenticationTokenSecretManager for an HBase cluster. This situation 
undesirable since the leader running on the HBase cluster doesn't steps down 
when the rogue leader registers in the HBase cluster and both will start 
rolling keys with the same IDs causing authentication errors. Even a reasonable 
"fix" is to point to a different {{zookeeper.znode.parent}}, we should make 
sure that we step down as leader correctly.



> An AuthenticationTokenSecretManager leader won't step down if another RS 
> claims to be a leader
> --
>
> Key: HBASE-22253
> URL: https://issues.apache.org/jira/browse/HBASE-22253
> Project: HBase
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Esteban Gutierrez
>Priority: Critical
>
> We ran into a situation were a rogue Lily HBase Indexer [SEP 
> Consumer|https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl/src/main/java/com/ngdata/sep/impl/SepConsumer.java#L169]
>  sharing the same {{zookeeper.znode.parent}} claimed to be 
> AuthenticationTokenSecretManager for an HBase cluster. This situation 
> undesirable since the leader running on the HBase cluster doesn't steps down 
> when the rogue leader registers in the HBase cluster and both will start 
> rolling keys with the same IDs causing authentication errors. Even a 
> reasonable "fix" is to point to a different {{zookeeper.znode.parent}}, we 
> should make sure that we step down as leader correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22253) An AuthenticationTokenSecretManager leader won't step down if another RS claims to be a leader

2019-04-16 Thread Esteban Gutierrez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-22253:
--
Description: 
We ran into a situation were a rouge Lily HBase Indexer [SEP 
Consumer|https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl/src/main/java/com/ngdata/sep/impl/SepConsumer.java#L169]
 sharing the same {{zookeeper.znode.parent}} claimed to be 
AuthenticationTokenSecretManager for an HBase cluster. This situation 
undesirable since the leader running on the HBase cluster doesn't steps down 
when the rogue leader registers in the HBase cluster and both will start 
rolling keys with the same IDs causing authentication errors. Even a reasonable 
"fix" is to point to a different {{zookeeper.znode.parent}}, we should make 
sure that we step down as leader correctly.


  was:
We ran into a situation were a rouge Lily HBase Indexer [SEP 
Consumer|https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl/src/main/java/com/ngdata/sep/impl/SepConsumer.java#L169]
 sharing the same {{zookeeper.znode.parent}} claimed to be 
AuthenticationTokenSecretManager for an HBase cluster. This situation 
undesirable since the leader running on the HBase cluster doesn't steps down 
when the rouge leader registers in the HBase cluster and both will start 
rolling keys with the same IDs causing authentication errors. Even a reasonable 
"fix" is to point to a different {{zookeeper.znode.parent}}, we should make 
sure that we step down as leader correctly.



> An AuthenticationTokenSecretManager leader won't step down if another RS 
> claims to be a leader
> --
>
> Key: HBASE-22253
> URL: https://issues.apache.org/jira/browse/HBASE-22253
> Project: HBase
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Esteban Gutierrez
>Priority: Critical
>
> We ran into a situation were a rouge Lily HBase Indexer [SEP 
> Consumer|https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl/src/main/java/com/ngdata/sep/impl/SepConsumer.java#L169]
>  sharing the same {{zookeeper.znode.parent}} claimed to be 
> AuthenticationTokenSecretManager for an HBase cluster. This situation 
> undesirable since the leader running on the HBase cluster doesn't steps down 
> when the rogue leader registers in the HBase cluster and both will start 
> rolling keys with the same IDs causing authentication errors. Even a 
> reasonable "fix" is to point to a different {{zookeeper.znode.parent}}, we 
> should make sure that we step down as leader correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-22253) An AuthenticationTokenSecretManager leader won't step down if another RS claims to be a leader

2019-04-16 Thread Esteban Gutierrez (JIRA)
Esteban Gutierrez created HBASE-22253:
-

 Summary: An AuthenticationTokenSecretManager leader won't step 
down if another RS claims to be a leader
 Key: HBASE-22253
 URL: https://issues.apache.org/jira/browse/HBASE-22253
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 2.1.0, 3.0.0, 2.2.0
Reporter: Esteban Gutierrez


We ran into a situation were a rouge Lily HBase Indexer [SEP 
Consumer|https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl/src/main/java/com/ngdata/sep/impl/SepConsumer.java#L169]
 sharing the same {{zookeeper.znode.parent}} claimed to be 
AuthenticationTokenSecretManager for an HBase cluster. This situation 
undesirable since the leader running on the HBase cluster doesn't steps down 
when the rouge leader registers in the HBase cluster and both will start 
rolling keys with the same IDs causing authentication errors. Even a reasonable 
"fix" is to point to a different {{zookeeper.znode.parent}}, we should make 
sure that we step down as leader correctly.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22116) HttpDoAsClient to support keytab and principal in command line argument.

2019-03-27 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803034#comment-16803034
 ] 

Esteban Gutierrez commented on HBASE-22116:
---

Thanks for this improvement [~subrat.mishra]. Could you please address the 
errors reported by Hadoop QA by javac?

> HttpDoAsClient to support keytab and principal in command line argument.
> 
>
> Key: HBASE-22116
> URL: https://issues.apache.org/jira/browse/HBASE-22116
> Project: HBase
>  Issue Type: Improvement
>Reporter: Subrat Mishra
>Assignee: Subrat Mishra
>Priority: Major
> Attachments: HBASE-22116.master.001.patch, 
> HBASE-22116.master.002.patch
>
>
> Currently, HttpDoAsClient relies only on kinit. It's good to add support for 
> keytab and principal in command line argument. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-22019) Ability to remotely connect to hbase when hbase/zook is hosted on dynamic IP addresses

2019-03-08 Thread Esteban Gutierrez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-22019.
---
Resolution: Invalid

Thanks for reporting this, [~toopt4]. Please reach out the HBase user [mailing 
list|https://hbase.apache.org/mailing-lists.html] for this type of this 
problems since many users have been able to connect to HBase clusters without 
any problem regardless if the HBase is on different networks. 

> Ability to remotely connect to hbase when hbase/zook is hosted on dynamic IP 
> addresses
> --
>
> Key: HBASE-22019
> URL: https://issues.apache.org/jira/browse/HBASE-22019
> Project: HBase
>  Issue Type: New Feature
>  Components: IPC/RPC, Zookeeper
>Reporter: t oo
>Priority: Major
>
> Our team's need for this is purely for remote connections (ie personal 
> laptops) to HBASE (hosted on EC2) to work as hbase connections under the 
> cover connect to zookeeper (also running on EC2) and attempt to resolve the 
> hostname (not DNS!) of the machine running zookeeper. From what I've read 
> others  re facing the issue:
> https://forums.aws.amazon.com/thread.jspa?threadID=119915
> https://stackoverflow.com/questions/30751187/unable-to-connect-to-hbase-stand-alone-server-from-windows-remote-client
> https://sematext.com/opensee/m/HBase/YGbbw6MGk1B9nCv?subj=Re:+Remote+Java+client+connection+into+EC2+instance
> https://community.cloudera.com/t5/Storage-Random-Access-HDFS/Problem-in-connectivity-between-HBase-amp-JAVA/td-p/1693
> https://stackoverflow.com/questions/9413481/hbase-node-could-not-be-reached-from-hbase-java-api-client
> https://groups.google.com/forum/#!topic/opentsdb/3w4FCnPYRDg
> Between ec2s I don't get the below error because I can edit /etc/hosts to add 
> the host name below but don't have root/admin access on other machines to do 
> the same. Problem is if we have 100s of users wanting to connect to hbase 
> data then they would all face this /etc/hosts issue.
> Example of the error:
> 19/03/01 17:02:14 WARN client.ConnectionUtils: Can not resolve 
> ip-10x.com, please check your network
> java.net.UnknownHostException: ip-10x.com: Name or service not known
>  at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
>  at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
>  at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
>  at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
>  at java.net.InetAddress.getAllByName(InetAddress.java:1193)
>  at java.net.InetAddress.getAllByName(InetAddress.java:1127)
>  at java.net.InetAddress.getByName(InetAddress.java:1077)
>  at 
> org.apache.hadoop.hbase.client.ConnectionUtils.getStubKey(ConnectionUtils.java:233)
>  at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.getClient(ConnectionImplementation.java:1189)
>  at 
> org.apache.hadoop.hbase.client.ClientServiceCallable.setStubByServiceName(ClientServiceCallable.java:44)
>  at 
> org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:229)
>  at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
>  at org.apache.hadoop.hbase.client.HTable.get(HTable.java:386)
>  at org.apache.hadoop.hbase.client.HTable.get(HTable.java:360)
>  at 
> org.apache.hadoop.hbase.MetaTableAccessor.getTableState(MetaTableAccessor.java:1066)
>  at 
> org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:389)
>  at org.apache.hadoop.hbase.client.HBaseAdmin$6.rpcCall(HBaseAdmin.java:437)
>  at org.apache.hadoop.hbase.client.HBaseAdmin$6.rpcCall(HBaseAdmin.java:434)
>  at 
> org.apache.hadoop.hbase.client.RpcRetryingCallable.call(RpcRetryingCallable.java:58)
>  at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107)
>  at 
> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3055)
>  at 
> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3047)
>  at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:434)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21915) FileLink$FileLinkInputStream doesn't implement CanUnbuffer

2019-02-19 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772263#comment-16772263
 ] 

Esteban Gutierrez commented on HBASE-21915:
---

+1

> FileLink$FileLinkInputStream doesn't implement CanUnbuffer
> --
>
> Key: HBASE-21915
> URL: https://issues.apache.org/jira/browse/HBASE-21915
> Project: HBase
>  Issue Type: Bug
>  Components: Filesystem Integration
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Attachments: HBASE-21915.001.patch, HBASE-21915.002.patch
>
>
> FileLinkInputStream is an InputStream which handles the indirection of where 
> the real HFile lives. This implementation is wrapped via 
> FSDataInputStreamWrapper and is transparent when it's being used by a caller. 
> Often, we have an FSDataInputStreamWrapper wrapping a FileLinkInputStream 
> which wraps an FSDataInputStream.
> The problem is that FileLinkInputStream does not implement the 
> \{{CanUnbuffer}} interface, which means that the underlying 
> {{FSDataInputStream}} for the HFile the link refers to doesn't get 
> {{unbuffer()}} called on it. This can cause an open Socket to hang around, as 
> described in HBASE-9393.
> Both [~wchevreuil] and myself have run into this, each for different users. 
> We think the commonality as to why these users saw this (but we haven't run 
> into it on our own) is that it requires a very large snapshot to be brought 
> into a new system. Big kudos to [~esteban] for his help in diagnosing this as 
> well!
> If this analysis is accurate, it would affect all branches.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21034) Add new throttle type: read/write capacity unit

2019-01-18 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746427#comment-16746427
 ] 

Esteban Gutierrez commented on HBASE-21034:
---

[~zghaobac]:
{quote}
This feature is small. And it was backported to branch-1. If we don't backport 
this to branch-2.1 and a user use this feature in 1.x version, so can't rolling 
upgrade to 2.1.* version?
{quote}
That can obviously be the case too while performing a rolling upgrading to a 
previous maintenance release from branch-2.1 and thats why is important to 
avoid this kind of things to happen. Even if this is a small feature as few 
have mentioned here, it adds few changes to our pubf specs and I think thats 
quite a stretch in a maintenance release. 


> Add new throttle type: read/write capacity unit
> ---
>
> Key: HBASE-21034
> URL: https://issues.apache.org/jira/browse/HBASE-21034
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Yi Mei
>Assignee: Yi Mei
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21034.branch-2.0.001.patch, 
> HBASE-21034.branch-2.0.001.patch, HBASE-21034.branch-2.1.001.patch, 
> HBASE-21034.branch-2.1.001.patch, HBASE-21034.master.001.patch, 
> HBASE-21034.master.002.patch, HBASE-21034.master.003.patch, 
> HBASE-21034.master.004.patch, HBASE-21034.master.005.patch, 
> HBASE-21034.master.006.patch, HBASE-21034.master.006.patch, 
> HBASE-21034.master.007.patch, HBASE-21034.master.007.patch
>
>
> Add new throttle type: read/write capacity unit like DynamoDB.
> One read capacity unit represents that read up to 1K data per time unit. If 
> data size is more than 1K, then consume additional read capacity units.
> One write capacity unit represents that one write for an item up to 1 KB in 
> size per time unit. If data size is more than 1K, then consume additional 
> write capacity units.
> For example, 100 read capacity units per second means that, HBase user can 
> read 100 times for 1K data in every second, or 50 times for 2K data in every 
> second and so on.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21034) Add new throttle type: read/write capacity unit

2019-01-17 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745755#comment-16745755
 ] 

Esteban Gutierrez commented on HBASE-21034:
---

I'm -1 to have this new feature in a maintenance release. I think the right 
approach should be to revert it. it won't be a good precedent to let this go 
thru as [~sershe] said.

> Add new throttle type: read/write capacity unit
> ---
>
> Key: HBASE-21034
> URL: https://issues.apache.org/jira/browse/HBASE-21034
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Yi Mei
>Assignee: Yi Mei
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21034.branch-2.0.001.patch, 
> HBASE-21034.branch-2.0.001.patch, HBASE-21034.branch-2.1.001.patch, 
> HBASE-21034.branch-2.1.001.patch, HBASE-21034.master.001.patch, 
> HBASE-21034.master.002.patch, HBASE-21034.master.003.patch, 
> HBASE-21034.master.004.patch, HBASE-21034.master.005.patch, 
> HBASE-21034.master.006.patch, HBASE-21034.master.006.patch, 
> HBASE-21034.master.007.patch, HBASE-21034.master.007.patch
>
>
> Add new throttle type: read/write capacity unit like DynamoDB.
> One read capacity unit represents that read up to 1K data per time unit. If 
> data size is more than 1K, then consume additional read capacity units.
> One write capacity unit represents that one write for an item up to 1 KB in 
> size per time unit. If data size is more than 1K, then consume additional 
> write capacity units.
> For example, 100 read capacity units per second means that, HBase user can 
> read 100 times for 1K data in every second, or 50 times for 2K data in every 
> second and so on.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-11-08 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679956#comment-16679956
 ] 

Esteban Gutierrez commented on HBASE-20604:
---

Thanks [~apurtell].

> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --
>
> Key: HBASE-20604
> URL: https://issues.apache.org/jira/browse/HBASE-20604
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Critical
> Attachments: HBASE-20604.002.patch, HBASE-20604.003.patch, 
> HBASE-20604.004.patch, HBASE-20604.005.patch, HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-11-07 Thread Esteban Gutierrez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-20604:
--
Attachment: HBASE-20604.005.patch

> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --
>
> Key: HBASE-20604
> URL: https://issues.apache.org/jira/browse/HBASE-20604
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Critical
> Attachments: HBASE-20604.002.patch, HBASE-20604.003.patch, 
> HBASE-20604.004.patch, HBASE-20604.005.patch, HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-11-06 Thread Esteban Gutierrez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-20604:
--
Attachment: HBASE-20604.004.patch

> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --
>
> Key: HBASE-20604
> URL: https://issues.apache.org/jira/browse/HBASE-20604
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Critical
> Attachments: HBASE-20604.002.patch, HBASE-20604.003.patch, 
> HBASE-20604.004.patch, HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-11-06 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677373#comment-16677373
 ] 

Esteban Gutierrez commented on HBASE-20604:
---

Thanks for the quick review [~busbey], please see the updated change.

> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --
>
> Key: HBASE-20604
> URL: https://issues.apache.org/jira/browse/HBASE-20604
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Critical
> Attachments: HBASE-20604.002.patch, HBASE-20604.003.patch, 
> HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-11-06 Thread Esteban Gutierrez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-20604:
--
Attachment: HBASE-20604.003.patch

> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --
>
> Key: HBASE-20604
> URL: https://issues.apache.org/jira/browse/HBASE-20604
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Critical
> Attachments: HBASE-20604.002.patch, HBASE-20604.003.patch, 
> HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-11-02 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673603#comment-16673603
 ] 

Esteban Gutierrez commented on HBASE-20604:
---

[~mdrob] I looked into that and and even it seems related we are doing 
positional reads and there is no pre-fetching involved. 

[~apurtell] we have been running in a production environment for months and we 
haven't run into an issue, also {{entry.getEdit().readFromCells}} needs to 
trigger a mismatch of the consumed entries vs  the expected entries or see an 
{{InvalidProtocolBufferException}} while consuming the WAL and seeking to 
{{originalPosition}}. So far, I think is safe to commit at this point if you 
are ok with the change. Thanks!

> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --
>
> Key: HBASE-20604
> URL: https://issues.apache.org/jira/browse/HBASE-20604
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Critical
> Attachments: HBASE-20604.002.patch, HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21154) Remove hbase:namespace table; fold it into hbase:meta

2018-09-05 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604852#comment-16604852
 ] 

Esteban Gutierrez commented on HBASE-21154:
---

+1

> Remove hbase:namespace table; fold it into hbase:meta
> -
>
> Key: HBASE-21154
> URL: https://issues.apache.org/jira/browse/HBASE-21154
> Project: HBase
>  Issue Type: Improvement
>Reporter: stack
>Priority: Major
>
> Namespace table is a small system table. Usually it has two rows. It must be 
> assigned before user tables but after hbase:meta goes out. Its presence 
> complicates our startup and is a constant source of grief when for whatever 
> reason, it is not up and available. In fact, master startup is predicated on 
> hbase:namespace being assigned and will not make progress unless it is up.
> Lets just add a new 'ns' column family to hbase:meta for namespace.
> Here is a default ns table content:
> {code}
> hbase(main):023:0* scan 'hbase:namespace'
> ROW   
>COLUMN+CELL
>  default  
>column=info:d, timestamp=1526694059106, 
> value=\x0A\x07default
>  hbase
>column=info:d, timestamp=1526694059461, 
> value=\x0A\x05hbase
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21134) Add guardrails to cell tags in order to avoid the tags length to overflow

2018-09-05 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604642#comment-16604642
 ] 

Esteban Gutierrez commented on HBASE-21134:
---

Thanks for updating there [~mdrob], its correct the issue happens with ACLs but 
it can also be triggered with Visibility Lables. Basically, you only need to 
pass a big enough Map that can exceed 32KB at serialization time in order to 
break HFileV3.

> Add guardrails to cell tags in order to avoid the tags length to overflow 
> --
>
> Key: HBASE-21134
> URL: https://issues.apache.org/jira/browse/HBASE-21134
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Critical
>
> We found that per cell tags can easily overflow and and cause failures while 
> reading HFiles. If a mutation has more than 32KB for the byte array with the 
> tags we should reject the operation on the client side (proactively) and the 
> server side as we deserialize the request.
> {code}
> 2018-08-21 11:08:45,387 ERROR 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction failed 
> Request = regionName=table1,,1534870486680.9112ca53504084152da5e28116f40ec2., 
> storeName=c1, fileCount=4, fileSize=254.2 K (138.0 K, 33.5 K, 34.0 K, 48.7 
> K), priority=1, time=8555785624243
> java.lang.IllegalStateException: Invalid currTagsLen -20658. Block offset: 0, 
> block length: 44912, position: 0 (without header).
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV3$ScannerV3.checkTagsLen(HFileReaderV3.java:226)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV3$ScannerV3.readKeyValueLen(HFileReaderV3.java:251)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.updateCurrBlock(HFileReaderV2.java:956)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:919)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:304)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:200)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:350)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:269)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:231)
>   at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:414)
>   at 
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:91)
>   at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:125)
>   at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1247)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1915)
>   at 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:529)
>   at 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:566)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-21134) Add guardrails to cell tags in order to avoid the tags length to overflow

2018-08-30 Thread Esteban Gutierrez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez reassigned HBASE-21134:
-

Assignee: Esteban Gutierrez

> Add guardrails to cell tags in order to avoid the tags length to overflow 
> --
>
> Key: HBASE-21134
> URL: https://issues.apache.org/jira/browse/HBASE-21134
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Critical
>
> We found that per cell tags can easily overflow and and cause failures while 
> reading HFiles. If a mutation has more than 32KB for the byte array with the 
> tags we should reject the operation on the client side (proactively) and the 
> server side as we deserialize the request.
> {code}
> 2018-08-21 11:08:45,387 ERROR 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction failed 
> Request = regionName=table1,,1534870486680.9112ca53504084152da5e28116f40ec2., 
> storeName=c1, fileCount=4, fileSize=254.2 K (138.0 K, 33.5 K, 34.0 K, 48.7 
> K), priority=1, time=8555785624243
> java.lang.IllegalStateException: Invalid currTagsLen -20658. Block offset: 0, 
> block length: 44912, position: 0 (without header).
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV3$ScannerV3.checkTagsLen(HFileReaderV3.java:226)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV3$ScannerV3.readKeyValueLen(HFileReaderV3.java:251)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.updateCurrBlock(HFileReaderV2.java:956)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:919)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:304)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:200)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:350)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:269)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:231)
>   at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:414)
>   at 
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:91)
>   at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:125)
>   at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1247)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1915)
>   at 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:529)
>   at 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:566)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21134) Add guardrails to cell tags in order to avoid the tags length to overflow

2018-08-30 Thread Esteban Gutierrez (JIRA)
Esteban Gutierrez created HBASE-21134:
-

 Summary: Add guardrails to cell tags in order to avoid the tags 
length to overflow 
 Key: HBASE-21134
 URL: https://issues.apache.org/jira/browse/HBASE-21134
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.5.0
Reporter: Esteban Gutierrez


We found that per cell tags can easily overflow and and cause failures while 
reading HFiles. If a mutation has more than 32KB for the byte array with the 
tags we should reject the operation on the client side (proactively) and the 
server side as we deserialize the request.

{code}
2018-08-21 11:08:45,387 ERROR 
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction failed 
Request = regionName=table1,,1534870486680.9112ca53504084152da5e28116f40ec2., 
storeName=c1, fileCount=4, fileSize=254.2 K (138.0 K, 33.5 K, 34.0 K, 48.7 K), 
priority=1, time=8555785624243
java.lang.IllegalStateException: Invalid currTagsLen -20658. Block offset: 0, 
block length: 44912, position: 0 (without header).
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV3$ScannerV3.checkTagsLen(HFileReaderV3.java:226)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV3$ScannerV3.readKeyValueLen(HFileReaderV3.java:251)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.updateCurrBlock(HFileReaderV2.java:956)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:919)
at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:304)
at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:200)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:350)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:269)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:231)
at 
org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:414)
at 
org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:91)
at 
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:125)
at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1247)
at 
org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1915)
at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:529)
at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:566)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20651) Master, prevents hbck or shell command to reassign the split parent region

2018-07-09 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537630#comment-16537630
 ] 

Esteban Gutierrez commented on HBASE-20651:
---

I think the test failures like the one from TestJMXConnectorServer are related 
to timeouts. It should be fine if the test is passing locally against branch-1.


> Master, prevents hbck or shell command to reassign the split parent region
> --
>
> Key: HBASE-20651
> URL: https://issues.apache.org/jira/browse/HBASE-20651
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 1.2.6
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>Priority: Minor
> Attachments: HBASE-20651-branch-1-v001.patch, 
> HBASE-20651-branch-1-v002.patch, HBASE-20651-branch-1-v003.patch
>
>
> We are seeing that hbck brings back split parent region and this causes 
> region inconsistency. More details will be filled as reproduce is still 
> ongoing. Might need to do something at hbck or master to prevent this from 
> happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20651) Master, prevents hbck or shell command to reassign the split parent region

2018-07-09 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537508#comment-16537508
 ] 

Esteban Gutierrez commented on HBASE-20651:
---

Thanks for the rest of the details [~huaxiang]! Yeah, is expected that multiple 
offline requests shouldn't be an issue. Based from your testing sounds like 
good for me.

> Master, prevents hbck or shell command to reassign the split parent region
> --
>
> Key: HBASE-20651
> URL: https://issues.apache.org/jira/browse/HBASE-20651
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 1.2.6
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>Priority: Minor
> Attachments: HBASE-20651-branch-1-v001.patch, 
> HBASE-20651-branch-1-v002.patch, HBASE-20651-branch-1-v003.patch
>
>
> We are seeing that hbck brings back split parent region and this causes 
> region inconsistency. More details will be filled as reproduce is still 
> ongoing. Might need to do something at hbck or master to prevent this from 
> happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20651) Master, prevents hbck or shell command to reassign the split parent region

2018-07-03 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531811#comment-16531811
 ] 

Esteban Gutierrez commented on HBASE-20651:
---

Just a quick observation, [~huaxiang] can you please print the region state 
instead of any of the 3 possible states? thanks!

> Master, prevents hbck or shell command to reassign the split parent region
> --
>
> Key: HBASE-20651
> URL: https://issues.apache.org/jira/browse/HBASE-20651
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 1.2.6
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>Priority: Minor
> Attachments: HBASE-20651-branch-1-v001.patch
>
>
> We are seeing that hbck brings back split parent region and this causes 
> region inconsistency. More details will be filled as reproduce is still 
> ongoing. Might need to do something at hbck or master to prevent this from 
> happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20761) FSReaderImpl#readBlockDataInternal can fail to switch to HDFS checksums in some edge cases

2018-06-21 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16519721#comment-16519721
 ] 

Esteban Gutierrez commented on HBASE-20761:
---

I think what [~mdrob] is working on HBASE-20674 is also relevant here for 
clarity. When SCRs are enabled, {{HFileSystem}} configures 
{{dfs.client.read.shortcircuit.skip.checksum}} to {{true}} and should skip HDFS 
checksums when enabled. That means that for edge cases like this one the only 
way to recover reading the header in a corrupt block is not just to set 
{{hbase.regionserver.checksum.verify}} to false to fallback to HDFS checksums 
but also to set {{dfs.client.read.shortcircuit.skip.checksum}}  to false.

> FSReaderImpl#readBlockDataInternal can fail to switch to HDFS checksums in 
> some edge cases
> --
>
> Key: HBASE-20761
> URL: https://issues.apache.org/jira/browse/HBASE-20761
> Project: HBase
>  Issue Type: Bug
>  Components: HFile
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Major
>
> One of our users reported this problem on HBase 1.2 before and after 
> HBASE-11625:
> {code}
> Caused by: java.io.IOException: On-disk size without header provided is 
> 131131, but block header contains 0. Block offset: 2073954793, data starts 
> with: \x00\x00\x00\x00\x00\x00\x00\x0\
> 0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock.validateOnDiskSizeWithoutHeader(HFileBlock.java:526)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock.access$700(HFileBlock.java:92)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1699)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1542)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:445)
> at 
> org.apache.hadoop.hbase.util.CompoundBloomFilter.contains(CompoundBloomFilter.java:100)
> {code}
> The problems occurs when we do a read a block without HDFS checksums enabled 
> and due some data corruption we end with an empty headerBuf while trying to 
> read the block before the HDFS checksum failover code. This will cause 
> further attempts to read the block to fail since we will still retry the 
> corrupt replica instead of reporting the corrupt replica and trying a 
> different one. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-20761) FSReaderImpl#readBlockDataInternal can fail to switch to HDFS checksums in some edge cases

2018-06-20 Thread Esteban Gutierrez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez reassigned HBASE-20761:
-

Assignee: Esteban Gutierrez

> FSReaderImpl#readBlockDataInternal can fail to switch to HDFS checksums in 
> some edge cases
> --
>
> Key: HBASE-20761
> URL: https://issues.apache.org/jira/browse/HBASE-20761
> Project: HBase
>  Issue Type: Bug
>  Components: HFile
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Major
>
> One of our users reported this problem on HBase 1.2 before and after 
> HBASE-11625:
> {code}
> Caused by: java.io.IOException: On-disk size without header provided is 
> 131131, but block header contains 0. Block offset: 2073954793, data starts 
> with: \x00\x00\x00\x00\x00\x00\x00\x0\
> 0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock.validateOnDiskSizeWithoutHeader(HFileBlock.java:526)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock.access$700(HFileBlock.java:92)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1699)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1542)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:445)
> at 
> org.apache.hadoop.hbase.util.CompoundBloomFilter.contains(CompoundBloomFilter.java:100)
> {code}
> The problems occurs when we do a read a block without HDFS checksums enabled 
> and due some data corruption we end with an empty headerBuf while trying to 
> read the block before the HDFS checksum failover code. This will cause 
> further attempts to read the block to fail since we will still retry the 
> corrupt replica instead of reporting the corrupt replica and trying a 
> different one. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20761) FSReaderImpl#readBlockDataInternal can fail to switch to HDFS checksums in some edge cases

2018-06-20 Thread Esteban Gutierrez (JIRA)
Esteban Gutierrez created HBASE-20761:
-

 Summary: FSReaderImpl#readBlockDataInternal can fail to switch to 
HDFS checksums in some edge cases
 Key: HBASE-20761
 URL: https://issues.apache.org/jira/browse/HBASE-20761
 Project: HBase
  Issue Type: Bug
  Components: HFile
Reporter: Esteban Gutierrez


One of our users reported this problem on HBase 1.2 before and after 
HBASE-11625:

{code}
Caused by: java.io.IOException: On-disk size without header provided is 131131, 
but block header contains 0. Block offset: 2073954793, data starts with: 
\x00\x00\x00\x00\x00\x00\x00\x0\
0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock.validateOnDiskSizeWithoutHeader(HFileBlock.java:526)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock.access$700(HFileBlock.java:92)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1699)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1542)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:445)
at 
org.apache.hadoop.hbase.util.CompoundBloomFilter.contains(CompoundBloomFilter.java:100)
{code}

The problems occurs when we do a read a block without HDFS checksums enabled 
and due some data corruption we end with an empty headerBuf while trying to 
read the block before the HDFS checksum failover code. This will cause further 
attempts to read the block to fail since we will still retry the corrupt 
replica instead of reporting the corrupt replica and trying a different one. 








--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20679) Add the ability to compile JSP dynamically in Jetty

2018-06-05 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502360#comment-16502360
 ] 

Esteban Gutierrez commented on HBASE-20679:
---

I'm -1 on having a way to inject JSPs in the server side, not only because it 
opens the possibility of security issues, but considering the maturity of HBase 
it gives the impression that we will always have a class of issues that cannot 
be fixed by our regular means: e.g. hbck, reassigning a region manually or 
deploying a fix and performing a rolling restart.

> Add the ability to compile JSP dynamically in Jetty
> ---
>
> Key: HBASE-20679
> URL: https://issues.apache.org/jira/browse/HBASE-20679
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-20679.002.patch, HBASE-20679.patch
>
>
> As discussed in HBASE-20617, adding the ability to dynamically compile jsp 
> enable us to do some hot fix. 
>  For example, several days ago, in our testing HBase-2.0 cluster, 
> procedureWals were corrupted due to some unknown reasons. After restarting 
> the cluster, since some procedures(AssignProcedure for example) were 
> corrupted and couldn't be replayed. Some regions were stuck in RIT forever. 
> We couldn't use HBCK since it haven't support AssignmentV2 yet. As a matter 
> of fact, the namespace region was not online, so the master was not inited, 
> we even couldn't use shell command like assign/move. But, we wrote a jsp and 
> fix this issue easily. The jsp file is like this:
> {code:java}
> <%
>   String action = request.getParameter("action");
>   HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER);
>   List offlineRegionsToAssign = new ArrayList<>();
>   List regionRITs = 
> master.getAssignmentManager()
>   .getRegionStates().getRegionsInTransition();
>   for (RegionStates.RegionStateNode regionStateNode :  regionRITs) {
> // if regionStateNode don't have a procedure attached, but meta state 
> shows
> // this region is in RIT, that means the previous procedure may be 
> corrupted
> // we need to create a new assignProcedure to assign them
> if (!regionStateNode.isInTransition()) {
>   offlineRegionsToAssign.add(regionStateNode.getRegionInfo());
>   out.println("RIT region:" + regionStateNode);
> }
>   }
>   // Assign offline regions. Uses round-robin.
>   if ("fix".equals(action) && offlineRegionsToAssign.size() > 0) {
> 
> master.getMasterProcedureExecutor().submitProcedures(master.getAssignmentManager().
> createRoundRobinAssignProcedures(offlineRegionsToAssign));
>   } else {
> out.println("use ?action=fix to fix RIT regions");
>   }
> %>
> {code}
> Above it is only one example we can do if we have the ability to compile jsp 
> dynamically. We think it is very useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-11625) Reading datablock throws "Invalid HFile block magic" and can not switch to hdfs checksum

2018-05-29 Thread Esteban Gutierrez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez reassigned HBASE-11625:
-

Assignee: Appy  (was: Esteban Gutierrez)

> Reading datablock throws "Invalid HFile block magic" and can not switch to 
> hdfs checksum 
> -
>
> Key: HBASE-11625
> URL: https://issues.apache.org/jira/browse/HBASE-11625
> Project: HBase
>  Issue Type: Bug
>  Components: HFile
>Affects Versions: 0.94.21, 0.98.4, 0.98.5, 1.0.1.1, 1.0.3
>Reporter: qian wang
>Assignee: Appy
>Priority: Major
> Fix For: 1.3.0, 1.2.2, 1.1.6, 2.0.0
>
> Attachments: 2711de1fdf73419d9f8afc6a8b86ce64.gz, 
> HBASE-11625-branch-1-v1.patch, HBASE-11625-branch-1.2-v1.patch, 
> HBASE-11625-branch-1.2-v2.patch, HBASE-11625-branch-1.2-v3.patch, 
> HBASE-11625-branch-1.2-v4.patch, HBASE-11625-master-v2.patch, 
> HBASE-11625-master-v3.patch, HBASE-11625-master.patch, 
> HBASE-11625.branch-1.1.001.patch, HBASE-11625.patch, correct-hfile, 
> corrupted-header-hfile
>
>
> when using hbase checksum,call readBlockDataInternal() in hfileblock.java, it 
> could happen file corruption but it only can switch to hdfs checksum 
> inputstream till validateBlockChecksum(). If the datablock's header corrupted 
> when b = new HFileBlock(),it throws the exception "Invalid HFile block magic" 
> and the rpc call fail



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-11625) Reading datablock throws "Invalid HFile block magic" and can not switch to hdfs checksum

2018-05-29 Thread Esteban Gutierrez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez reassigned HBASE-11625:
-

Assignee: Esteban Gutierrez  (was: Appy)

> Reading datablock throws "Invalid HFile block magic" and can not switch to 
> hdfs checksum 
> -
>
> Key: HBASE-11625
> URL: https://issues.apache.org/jira/browse/HBASE-11625
> Project: HBase
>  Issue Type: Bug
>  Components: HFile
>Affects Versions: 0.94.21, 0.98.4, 0.98.5, 1.0.1.1, 1.0.3
>Reporter: qian wang
>Assignee: Esteban Gutierrez
>Priority: Major
> Fix For: 1.3.0, 1.2.2, 1.1.6, 2.0.0
>
> Attachments: 2711de1fdf73419d9f8afc6a8b86ce64.gz, 
> HBASE-11625-branch-1-v1.patch, HBASE-11625-branch-1.2-v1.patch, 
> HBASE-11625-branch-1.2-v2.patch, HBASE-11625-branch-1.2-v3.patch, 
> HBASE-11625-branch-1.2-v4.patch, HBASE-11625-master-v2.patch, 
> HBASE-11625-master-v3.patch, HBASE-11625-master.patch, 
> HBASE-11625.branch-1.1.001.patch, HBASE-11625.patch, correct-hfile, 
> corrupted-header-hfile
>
>
> when using hbase checksum,call readBlockDataInternal() in hfileblock.java, it 
> could happen file corruption but it only can switch to hdfs checksum 
> inputstream till validateBlockChecksum(). If the datablock's header corrupted 
> when b = new HFileBlock(),it throws the exception "Invalid HFile block magic" 
> and the rpc call fail



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19572) RegionMover should use the configured default port number and not the one from HConstants

2018-05-24 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489162#comment-16489162
 ] 

Esteban Gutierrez commented on HBASE-19572:
---

Thanks [~brfrn169]! I will go ahead and commit shortly.


> RegionMover should use the configured default port number and not the one 
> from HConstants
> -
>
> Key: HBASE-19572
> URL: https://issues.apache.org/jira/browse/HBASE-19572
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Toshihiro Suzuki
>Priority: Major
> Attachments: HBASE-19572.master.001.patch, 
> HBASE-19572.master.001.patch, HBASE-19572.master.003.patch, 
> HBASE-19572.master.004.patch, HBASE-19572.patch, HBASE-19572.patch
>
>
> The issue I ran into HBASE-19499 was due RegionMover not using the port used 
> by {{hbase-site.xml}}. The tool should use the value used in the 
> configuration before falling back to the hardcoded value 
> {{HConstants.DEFAULT_REGIONSERVER_PORT}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-05-22 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-20604:
--
Attachment: HBASE-20604.002.patch

> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --
>
> Key: HBASE-20604
> URL: https://issues.apache.org/jira/browse/HBASE-20604
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Critical
> Attachments: HBASE-20604.002.patch, HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-05-22 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484522#comment-16484522
 ] 

Esteban Gutierrez commented on HBASE-20604:
---

Thanks [~Apache9]. I'm looking into injecting a failure in 
{{ProtobufUtil.mergeFrom()}} or maybe directly into {{FSDataInputStream}} in 
order to have more accurate test. 

Attaching new patch that additionally does a seek back to the original position 
of the stream when no KVs are present so an additional read to the stream 
shouldn't trigger an unnecessary EOFException.


> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --
>
> Key: HBASE-20604
> URL: https://issues.apache.org/jira/browse/HBASE-20604
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Critical
> Attachments: HBASE-20604.002.patch, HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-05-18 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-20604:
--
Attachment: HBASE-20604.patch

> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --
>
> Key: HBASE-20604
> URL: https://issues.apache.org/jira/browse/HBASE-20604
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Major
> Attachments: HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-05-18 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez reassigned HBASE-20604:
-

Assignee: Esteban Gutierrez

> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --
>
> Key: HBASE-20604
> URL: https://issues.apache.org/jira/browse/HBASE-20604
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Major
> Attachments: HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-05-18 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-20604:
--
Affects Version/s: (was: 1.5.0)
   (was: 2.1.0)
   Status: Patch Available  (was: Open)

> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --
>
> Key: HBASE-20604
> URL: https://issues.apache.org/jira/browse/HBASE-20604
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Major
> Attachments: HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-05-18 Thread Esteban Gutierrez (JIRA)
Esteban Gutierrez created HBASE-20604:
-

 Summary: ProtobufLogReader#readNext can incorrectly loop to the 
same position in the stream until the the WAL is rolled
 Key: HBASE-20604
 URL: https://issues.apache.org/jira/browse/HBASE-20604
 Project: HBase
  Issue Type: Bug
  Components: Replication, wal
Affects Versions: 3.0.0, 2.1.0, 1.5.0
Reporter: Esteban Gutierrez


Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
associated to the {{FSDataInputStream}} from the WAL that we are reading. Under 
certain conditions, e.g. when using the encryption at rest 
({{CryptoInputStream}}) the stream can return partial data which can cause a 
premature EOF that cause {{inputStream.getPos()}} to return to the same origina 
position causing {{ProtobufLogReader#readNext}} to re-try over the reads until 
the WAL is rolled.

The side effect of this issue is that {{ReplicationSource}} can get stuck until 
the WAL is rolled and causing replication delays up to an hour in some cases.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19572) RegionMover should use the configured default port number and not the one from HConstants

2018-05-14 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-19572:
--
Attachment: HBASE-19572.master.003.patch.txt

> RegionMover should use the configured default port number and not the one 
> from HConstants
> -
>
> Key: HBASE-19572
> URL: https://issues.apache.org/jira/browse/HBASE-19572
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Toshihiro Suzuki
>Priority: Major
> Attachments: HBASE-19572.master.001.patch, 
> HBASE-19572.master.001.patch, HBASE-19572.master.003.patch, HBASE-19572.patch
>
>
> The issue I ran into HBASE-19499 was due RegionMover not using the port used 
> by {{hbase-site.xml}}. The tool should use the value used in the 
> configuration before falling back to the hardcoded value 
> {{HConstants.DEFAULT_REGIONSERVER_PORT}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19572) RegionMover should use the configured default port number and not the one from HConstants

2018-05-14 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-19572:
--
Attachment: HBASE-19572.master.003.patch

> RegionMover should use the configured default port number and not the one 
> from HConstants
> -
>
> Key: HBASE-19572
> URL: https://issues.apache.org/jira/browse/HBASE-19572
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Toshihiro Suzuki
>Priority: Major
> Attachments: HBASE-19572.master.001.patch, 
> HBASE-19572.master.001.patch, HBASE-19572.master.003.patch, HBASE-19572.patch
>
>
> The issue I ran into HBASE-19499 was due RegionMover not using the port used 
> by {{hbase-site.xml}}. The tool should use the value used in the 
> configuration before falling back to the hardcoded value 
> {{HConstants.DEFAULT_REGIONSERVER_PORT}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19572) RegionMover should use the configured default port number and not the one from HConstants

2018-05-14 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-19572:
--
Attachment: (was: HBASE-19572.master.003.patch.txt)

> RegionMover should use the configured default port number and not the one 
> from HConstants
> -
>
> Key: HBASE-19572
> URL: https://issues.apache.org/jira/browse/HBASE-19572
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Toshihiro Suzuki
>Priority: Major
> Attachments: HBASE-19572.master.001.patch, 
> HBASE-19572.master.001.patch, HBASE-19572.master.003.patch, HBASE-19572.patch
>
>
> The issue I ran into HBASE-19499 was due RegionMover not using the port used 
> by {{hbase-site.xml}}. The tool should use the value used in the 
> configuration before falling back to the hardcoded value 
> {{HConstants.DEFAULT_REGIONSERVER_PORT}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19572) RegionMover should use the configured default port number and not the one from HConstants

2018-05-14 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474644#comment-16474644
 ] 

Esteban Gutierrez commented on HBASE-19572:
---

lgtm [~brfrn169]. Will upload again, just to make sure it stills apply to 
master.

> RegionMover should use the configured default port number and not the one 
> from HConstants
> -
>
> Key: HBASE-19572
> URL: https://issues.apache.org/jira/browse/HBASE-19572
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Toshihiro Suzuki
>Priority: Major
> Attachments: HBASE-19572.master.001.patch, 
> HBASE-19572.master.001.patch, HBASE-19572.patch
>
>
> The issue I ran into HBASE-19499 was due RegionMover not using the port used 
> by {{hbase-site.xml}}. The tool should use the value used in the 
> configuration before falling back to the hardcoded value 
> {{HConstants.DEFAULT_REGIONSERVER_PORT}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19994) Create a new class for RPC throttling exception, make it retryable.

2018-04-06 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428913#comment-16428913
 ] 

Esteban Gutierrez commented on HBASE-19994:
---

+1 but can we add release notes about the new exception and how this will 
impact clients during a rolling restart of HBase where quotas are being used? 
Thanks!

> Create a new class for RPC throttling exception, make it retryable. 
> 
>
> Key: HBASE-19994
> URL: https://issues.apache.org/jira/browse/HBASE-19994
> Project: HBase
>  Issue Type: Improvement
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>Priority: Major
> Attachments: HBASE-19994-master-v01.patch, 
> HBASE-19994-master-v02.patch, HBASE-19994-master-v03.patch, 
> HBASE-19994-master-v04.patch, HBASE-19994-master-v05.patch, 
> HBASE-19994-master-v06.patch, HBASE-19994-master-v07.patch
>
>
> Based on a discussion at dev mailing list.
>  
> {code:java}
> Thanks Andrew.
> +1 for the second option, I will create a jira for this change.
> Huaxiang
> On Feb 9, 2018, at 1:09 PM, Andrew Purtell  wrote:
> We have
> public class ThrottlingException extends QuotaExceededException
> public class QuotaExceededException extends DoNotRetryIOException
> Let the storage quota limits throw QuotaExceededException directly (based
> on DNRIOE). That seems fine.
> However, ThrottlingException is thrown as a result of a temporal quota,
> so it is inappropriate for this to inherit from DNRIOE, it should inherit
> IOException instead so the client is allowed to retry until successful, or
> until the retry policy is exhausted.
> We are in a bit of a pickle because we've released with this inheritance
> hierarchy, so to change it we will need a new minor, or we will want to
> deprecate ThrottlingException and use a new exception class instead, one
> which does not inherit from DNRIOE.
> On Feb 7, 2018, at 9:25 AM, Huaxiang Sun  wrote:
> Hi Mike,
>   You are right. For rpc throttling, definitely it is retryable. For storage 
> quota, I think it will be fail faster (non-retryable).
>   We probably need to separate these two types of exceptions, I will do some 
> more research and follow up.
>   Thanks,
>   Huaxiang
> On Feb 7, 2018, at 9:16 AM, Mike Drob  wrote:
> I think, philosophically, there can be two kinds of QEE -
> For throttling, we can retry. The quota is a temporal quota - you have done
> too many operations this minute, please try again next minute and
> everything will work.
> For storage, we shouldn't retry. The quota is a fixed quote - you have
> exceeded your allotted disk space, please do not try again until you have
> remedied the situation.
> Our current usage conflates the two, sometimes it is correct, sometimes not.
> On Wed, Feb 7, 2018 at 11:00 AM, Huaxiang Sun  wrote:
> Hi Stack,
>  I run into a case that a mapreduce job in hive cannot finish because
> it runs into a QEE.
> I need to look into the hive mr task to see if QEE is not handled
> correctly in hbase code or in hive code.
> I am thinking that if  QEE is a retryable exception, then it should be
> taken care of by the hbase code.
> I will check more and report back.
> Thanks,
> Huaxiang
> On Feb 7, 2018, at 8:23 AM, Stack  wrote:
> QEE being a DNRIOE seems right on the face of it.
> But if throttling, a DNRIOE is inappropriate. Where you seeing a QEE in a
> throttling scenario Huaxiang?
> Thanks,
> S
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-19741) Port CSRF prevention filter (HBASE-15187) to the HBase Thrift server

2018-01-09 Thread Esteban Gutierrez (JIRA)
Esteban Gutierrez created HBASE-19741:
-

 Summary: Port CSRF prevention filter (HBASE-15187) to the HBase 
Thrift server
 Key: HBASE-19741
 URL: https://issues.apache.org/jira/browse/HBASE-19741
 Project: HBase
  Issue Type: Bug
Reporter: Esteban Gutierrez
Priority: Minor


Our thrift server is prone to the same CSRF issue described in HBASE-15187. 
Even it only affects browsers it triggers a positive match in some venerability 
scanners even there is no real impact. We should correct our headers in the 
HBase Thrift server to avoid that problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19572) RegionMover should use the configured default port number and not the one from HConstants

2017-12-20 Thread Esteban Gutierrez (JIRA)
Esteban Gutierrez created HBASE-19572:
-

 Summary: RegionMover should use the configured default port number 
and not the one from HConstants
 Key: HBASE-19572
 URL: https://issues.apache.org/jira/browse/HBASE-19572
 Project: HBase
  Issue Type: Bug
Reporter: Esteban Gutierrez


The issue I ran into HBASE-19499 was due RegionMover not using the port used by 
{{hbase-site.xml}}. The tool should use the value used in the configuration 
before falling back to the hardcoded value 
{{HConstants.DEFAULT_REGIONSERVER_PORT}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19499) RegionMover#stripMaster in RegionMover needs to handle HBASE-18511 gracefully

2017-12-20 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16299108#comment-16299108
 ] 

Esteban Gutierrez commented on HBASE-19499:
---

Tried to reproduce and it was an error due the argument passed to the 
RegionMover.

> RegionMover#stripMaster in RegionMover needs to handle HBASE-18511 gracefully
> -
>
> Key: HBASE-19499
> URL: https://issues.apache.org/jira/browse/HBASE-19499
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Esteban Gutierrez
>
> Probably this is the first of few issues found during some tests with 
> RegionMover. After HBASE-13014 we ship the new RegionMover tool but it 
> currently assumes that master will be hosting regions so it attempts to 
> remove master from the list and that causes an issue similar to this:
> {code}
> 17/12/12 11:01:06 WARN util.RegionMover: Could not remove master from list of 
> RS
> java.lang.Exception: Server host1.example.com:22001 is not in list of online 
> servers(Offline/Incorrect)
>   at 
> org.apache.hadoop.hbase.util.RegionMover.stripServer(RegionMover.java:818)
>   at 
> org.apache.hadoop.hbase.util.RegionMover.stripMaster(RegionMover.java:757)
>   at 
> org.apache.hadoop.hbase.util.RegionMover.access$1800(RegionMover.java:78)
>   at 
> org.apache.hadoop.hbase.util.RegionMover$Unload.call(RegionMover.java:339)
>   at 
> org.apache.hadoop.hbase.util.RegionMover$Unload.call(RegionMover.java:314)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Basicaly



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HBASE-19499) RegionMover#stripMaster in RegionMover needs to handle HBASE-18511 gracefully

2017-12-20 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-19499.
---
Resolution: Not A Bug

> RegionMover#stripMaster in RegionMover needs to handle HBASE-18511 gracefully
> -
>
> Key: HBASE-19499
> URL: https://issues.apache.org/jira/browse/HBASE-19499
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Esteban Gutierrez
>
> Probably this is the first of few issues found during some tests with 
> RegionMover. After HBASE-13014 we ship the new RegionMover tool but it 
> currently assumes that master will be hosting regions so it attempts to 
> remove master from the list and that causes an issue similar to this:
> {code}
> 17/12/12 11:01:06 WARN util.RegionMover: Could not remove master from list of 
> RS
> java.lang.Exception: Server host1.example.com:22001 is not in list of online 
> servers(Offline/Incorrect)
>   at 
> org.apache.hadoop.hbase.util.RegionMover.stripServer(RegionMover.java:818)
>   at 
> org.apache.hadoop.hbase.util.RegionMover.stripMaster(RegionMover.java:757)
>   at 
> org.apache.hadoop.hbase.util.RegionMover.access$1800(RegionMover.java:78)
>   at 
> org.apache.hadoop.hbase.util.RegionMover$Unload.call(RegionMover.java:339)
>   at 
> org.apache.hadoop.hbase.util.RegionMover$Unload.call(RegionMover.java:314)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Basicaly



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19391) Calling HRegion#initializeRegionInternals from a region replica can still re-create a region directory

2017-12-20 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-19391:
--
Status: Patch Available  (was: Open)

> Calling HRegion#initializeRegionInternals from a region replica can still 
> re-create a region directory
> --
>
> Key: HBASE-19391
> URL: https://issues.apache.org/jira/browse/HBASE-19391
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
> Attachments: HBASE-19391.master.v0.patch
>
>
> This is a follow up from HBASE-18024. There stills a chance that attempting 
> to open a region that is not the default region replica can still create a 
> GC'd region directory by the CatalogJanitor causing inconsistencies with hbck.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19391) Calling HRegion#initializeRegionInternals from a region replica can still re-create a region directory

2017-12-20 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-19391:
--
Attachment: HBASE-19391.master.v0.patch

> Calling HRegion#initializeRegionInternals from a region replica can still 
> re-create a region directory
> --
>
> Key: HBASE-19391
> URL: https://issues.apache.org/jira/browse/HBASE-19391
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
> Attachments: HBASE-19391.master.v0.patch
>
>
> This is a follow up from HBASE-18024. There stills a chance that attempting 
> to open a region that is not the default region replica can still create a 
> GC'd region directory by the CatalogJanitor causing inconsistencies with hbck.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19499) RegionMover#stripMaster in RegionMover needs to handle HBASE-18511 gracefully

2017-12-12 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288459#comment-16288459
 ] 

Esteban Gutierrez commented on HBASE-19499:
---

In fact, what causes the RegionMover to fail is this:
{code}
7/12/12 11:00:28 ERROR util.RegionMover: Error while unloading regions
java.lang.Exception: Server host1.example.com:22001 is not in list of online 
servers(Offline/Incorrect)
at 
org.apache.hadoop.hbase.util.RegionMover.stripServer(RegionMover.java:818)
at 
org.apache.hadoop.hbase.util.RegionMover.access$1500(RegionMover.java:78)
at 
org.apache.hadoop.hbase.util.RegionMover$Unload.call(RegionMover.java:336)
at 
org.apache.hadoop.hbase.util.RegionMover$Unload.call(RegionMover.java:314)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}


> RegionMover#stripMaster in RegionMover needs to handle HBASE-18511 gracefully
> -
>
> Key: HBASE-19499
> URL: https://issues.apache.org/jira/browse/HBASE-19499
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Esteban Gutierrez
>
> Probably this is the first of few issues found during some tests with 
> RegionMover. After HBASE-13014 we ship the new RegionMover tool but it 
> currently assumes that master will be hosting regions so it attempts to 
> remove master from the list and that causes an issue similar to this:
> {code}
> 17/12/12 11:01:06 WARN util.RegionMover: Could not remove master from list of 
> RS
> java.lang.Exception: Server host1.example.com:22001 is not in list of online 
> servers(Offline/Incorrect)
>   at 
> org.apache.hadoop.hbase.util.RegionMover.stripServer(RegionMover.java:818)
>   at 
> org.apache.hadoop.hbase.util.RegionMover.stripMaster(RegionMover.java:757)
>   at 
> org.apache.hadoop.hbase.util.RegionMover.access$1800(RegionMover.java:78)
>   at 
> org.apache.hadoop.hbase.util.RegionMover$Unload.call(RegionMover.java:339)
>   at 
> org.apache.hadoop.hbase.util.RegionMover$Unload.call(RegionMover.java:314)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Basicaly



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19499) RegionMover#stripMaster in RegionMover needs to handle HBASE-18511 gracefully

2017-12-12 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-19499:
--
Summary: RegionMover#stripMaster in RegionMover needs to handle HBASE-18511 
gracefully  (was: RegionMover#stripMaster is not longer necessary in 
RegionMover)

> RegionMover#stripMaster in RegionMover needs to handle HBASE-18511 gracefully
> -
>
> Key: HBASE-19499
> URL: https://issues.apache.org/jira/browse/HBASE-19499
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Esteban Gutierrez
>
> Probably this is the first of few issues found during some tests with 
> RegionMover. After HBASE-13014 we ship the new RegionMover tool but it 
> currently assumes that master will be hosting regions so it attempts to 
> remove master from the list and that causes an issue similar to this:
> {code}
> 17/12/12 11:01:06 WARN util.RegionMover: Could not remove master from list of 
> RS
> java.lang.Exception: Server host1.example.com:22001 is not in list of online 
> servers(Offline/Incorrect)
>   at 
> org.apache.hadoop.hbase.util.RegionMover.stripServer(RegionMover.java:818)
>   at 
> org.apache.hadoop.hbase.util.RegionMover.stripMaster(RegionMover.java:757)
>   at 
> org.apache.hadoop.hbase.util.RegionMover.access$1800(RegionMover.java:78)
>   at 
> org.apache.hadoop.hbase.util.RegionMover$Unload.call(RegionMover.java:339)
>   at 
> org.apache.hadoop.hbase.util.RegionMover$Unload.call(RegionMover.java:314)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Basicaly



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19499) RegionMover#stripMaster is not longer necessary in RegionMover

2017-12-12 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288438#comment-16288438
 ] 

Esteban Gutierrez commented on HBASE-19499:
---

[~stack] pointed out to HBASE-18511. We cannot just remove that test, but we if 
{{hbase.balancer.tablesOnMaster.systemTablesOnly}} and 
{{hbase.balancer.tablesOnMaster}} are enabled we can be more flexible in order 
to avoid that failure.

> RegionMover#stripMaster is not longer necessary in RegionMover
> --
>
> Key: HBASE-19499
> URL: https://issues.apache.org/jira/browse/HBASE-19499
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Esteban Gutierrez
>
> Probably this is the first of few issues found during some tests with 
> RegionMover. After HBASE-13014 we ship the new RegionMover tool but it 
> currently assumes that master will be hosting regions so it attempts to 
> remove master from the list and that causes an issue similar to this:
> {code}
> 17/12/12 11:01:06 WARN util.RegionMover: Could not remove master from list of 
> RS
> java.lang.Exception: Server host1.example.com:22001 is not in list of online 
> servers(Offline/Incorrect)
>   at 
> org.apache.hadoop.hbase.util.RegionMover.stripServer(RegionMover.java:818)
>   at 
> org.apache.hadoop.hbase.util.RegionMover.stripMaster(RegionMover.java:757)
>   at 
> org.apache.hadoop.hbase.util.RegionMover.access$1800(RegionMover.java:78)
>   at 
> org.apache.hadoop.hbase.util.RegionMover$Unload.call(RegionMover.java:339)
>   at 
> org.apache.hadoop.hbase.util.RegionMover$Unload.call(RegionMover.java:314)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Basicaly



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HBASE-19499) RegionMover#stripMaster is not longer necessary in RegionMover

2017-12-12 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288438#comment-16288438
 ] 

Esteban Gutierrez edited comment on HBASE-19499 at 12/12/17 11:12 PM:
--

[~stack] pointed out to HBASE-18511. We cannot just remove that condition, but 
we if {{hbase.balancer.tablesOnMaster.systemTablesOnly}} and 
{{hbase.balancer.tablesOnMaster}} are enabled we can be more flexible in order 
to avoid that failure.


was (Author: esteban):
[~stack] pointed out to HBASE-18511. We cannot just remove that test, but we if 
{{hbase.balancer.tablesOnMaster.systemTablesOnly}} and 
{{hbase.balancer.tablesOnMaster}} are enabled we can be more flexible in order 
to avoid that failure.

> RegionMover#stripMaster is not longer necessary in RegionMover
> --
>
> Key: HBASE-19499
> URL: https://issues.apache.org/jira/browse/HBASE-19499
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Esteban Gutierrez
>
> Probably this is the first of few issues found during some tests with 
> RegionMover. After HBASE-13014 we ship the new RegionMover tool but it 
> currently assumes that master will be hosting regions so it attempts to 
> remove master from the list and that causes an issue similar to this:
> {code}
> 17/12/12 11:01:06 WARN util.RegionMover: Could not remove master from list of 
> RS
> java.lang.Exception: Server host1.example.com:22001 is not in list of online 
> servers(Offline/Incorrect)
>   at 
> org.apache.hadoop.hbase.util.RegionMover.stripServer(RegionMover.java:818)
>   at 
> org.apache.hadoop.hbase.util.RegionMover.stripMaster(RegionMover.java:757)
>   at 
> org.apache.hadoop.hbase.util.RegionMover.access$1800(RegionMover.java:78)
>   at 
> org.apache.hadoop.hbase.util.RegionMover$Unload.call(RegionMover.java:339)
>   at 
> org.apache.hadoop.hbase.util.RegionMover$Unload.call(RegionMover.java:314)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Basicaly



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19499) RegionMover#stripMaster is not longer necessary in RegionMover

2017-12-12 Thread Esteban Gutierrez (JIRA)
Esteban Gutierrez created HBASE-19499:
-

 Summary: RegionMover#stripMaster is not longer necessary in 
RegionMover
 Key: HBASE-19499
 URL: https://issues.apache.org/jira/browse/HBASE-19499
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Esteban Gutierrez


Probably this is the first of few issues found during some tests with 
RegionMover. After HBASE-13014 we ship the new RegionMover tool but it 
currently assumes that master will be hosting regions so it attempts to remove 
master from the list and that causes an issue similar to this:

{code}
17/12/12 11:01:06 WARN util.RegionMover: Could not remove master from list of RS
java.lang.Exception: Server host1.example.com:22001 is not in list of online 
servers(Offline/Incorrect)
at 
org.apache.hadoop.hbase.util.RegionMover.stripServer(RegionMover.java:818)
at 
org.apache.hadoop.hbase.util.RegionMover.stripMaster(RegionMover.java:757)
at 
org.apache.hadoop.hbase.util.RegionMover.access$1800(RegionMover.java:78)
at 
org.apache.hadoop.hbase.util.RegionMover$Unload.call(RegionMover.java:339)
at 
org.apache.hadoop.hbase.util.RegionMover$Unload.call(RegionMover.java:314)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

Basicaly



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19352) Port HADOOP-10379: Protect authentication cookies with the HttpOnly and Secure flags

2017-12-01 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275006#comment-16275006
 ] 

Esteban Gutierrez commented on HBASE-19352:
---

[~mdrob], yeah The same should go into branch-1 since the issue is about 
consistency of the security flags for the auth cookies.

> Port HADOOP-10379: Protect authentication cookies with the HttpOnly and 
> Secure flags
> 
>
> Key: HBASE-19352
> URL: https://issues.apache.org/jira/browse/HBASE-19352
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
> Attachments: HBASE-19352.master.v0.patch
>
>
> This came via a security scanner, since we have a fork of HttpServer2 in 
> HBase we should include it too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19352) Port HADOOP-10379: Protect authentication cookies with the HttpOnly and Secure flags

2017-11-30 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273132#comment-16273132
 ] 

Esteban Gutierrez commented on HBASE-19352:
---

It should be fine with any 9.3

> Port HADOOP-10379: Protect authentication cookies with the HttpOnly and 
> Secure flags
> 
>
> Key: HBASE-19352
> URL: https://issues.apache.org/jira/browse/HBASE-19352
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
> Attachments: HBASE-19352.master.v0.patch
>
>
> This came via a security scanner, since we have a fork of HttpServer2 in 
> HBase we should include it too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19352) Port HADOOP-10379: Protect authentication cookies with the HttpOnly and Secure flags

2017-11-30 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-19352:
--
Assignee: Esteban Gutierrez
  Status: Patch Available  (was: Open)

> Port HADOOP-10379: Protect authentication cookies with the HttpOnly and 
> Secure flags
> 
>
> Key: HBASE-19352
> URL: https://issues.apache.org/jira/browse/HBASE-19352
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
> Attachments: HBASE-19352.master.v0.patch
>
>
> This came via a security scanner, since we have a fork of HttpServer2 in 
> HBase we should include it too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19391) Calling HRegion#initializeRegionInternals from a region replica can still re-create a region directory

2017-11-30 Thread Esteban Gutierrez (JIRA)
Esteban Gutierrez created HBASE-19391:
-

 Summary: Calling HRegion#initializeRegionInternals from a region 
replica can still re-create a region directory
 Key: HBASE-19391
 URL: https://issues.apache.org/jira/browse/HBASE-19391
 Project: HBase
  Issue Type: Bug
Reporter: Esteban Gutierrez
Assignee: Esteban Gutierrez


This is a follow up from HBASE-18024. There stills a chance that attempting to 
open a region that is not the default region replica can still create a GC'd 
region directory by the CatalogJanitor causing inconsistencies with hbck.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19390) Revert to older version of Jetty 9.3

2017-11-30 Thread Esteban Gutierrez (JIRA)
Esteban Gutierrez created HBASE-19390:
-

 Summary: Revert to older version of Jetty 9.3 
 Key: HBASE-19390
 URL: https://issues.apache.org/jira/browse/HBASE-19390
 Project: HBase
  Issue Type: Bug
Reporter: Esteban Gutierrez


As discussed in HBASE-19256 we will have to temporarily revert to Jetty 9.3 due 
existing issues with 9.4 and Hadoop3. Once HBASE-19256 is resolved we can 
revert to 9.4.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19352) Port HADOOP-10379: Protect authentication cookies with the HttpOnly and Secure flags

2017-11-29 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-19352:
--
Attachment: HBASE-19352.master.v0.patch

Patch depends on Jetty 9.3.20 for now.

> Port HADOOP-10379: Protect authentication cookies with the HttpOnly and 
> Secure flags
> 
>
> Key: HBASE-19352
> URL: https://issues.apache.org/jira/browse/HBASE-19352
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
> Attachments: HBASE-19352.master.v0.patch
>
>
> This came via a security scanner, since we have a fork of HttpServer2 in 
> HBase we should include it too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19256) [hbase-thirdparty] shade jetty

2017-11-27 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267641#comment-16267641
 ] 

Esteban Gutierrez commented on HBASE-19256:
---

I'm +1 for reverting.

> [hbase-thirdparty] shade jetty
> --
>
> Key: HBASE-19256
> URL: https://issues.apache.org/jira/browse/HBASE-19256
> Project: HBase
>  Issue Type: Task
>  Components: dependencies, thirdparty
>Reporter: Mike Drob
>Assignee: Mike Drob
> Fix For: thirdparty-1.0.2
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19352) Port HADOOP-10379: Protect authentication cookies with the HttpOnly and Secure flags

2017-11-27 Thread Esteban Gutierrez (JIRA)
Esteban Gutierrez created HBASE-19352:
-

 Summary: Port HADOOP-10379: Protect authentication cookies with 
the HttpOnly and Secure flags
 Key: HBASE-19352
 URL: https://issues.apache.org/jira/browse/HBASE-19352
 Project: HBase
  Issue Type: Bug
Reporter: Esteban Gutierrez


This came via a security scanner, since we have a fork of HttpServer2 in HBase 
we should include it too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HBASE-18987) Raise value of HConstants#MAX_ROW_LENGTH

2017-11-20 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez resolved HBASE-18987.
---
Resolution: Later

Solving as later since we could only do this with a new HFile format.

> Raise value of HConstants#MAX_ROW_LENGTH
> 
>
> Key: HBASE-18987
> URL: https://issues.apache.org/jira/browse/HBASE-18987
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Minor
> Attachments: HBASE-18987.master.001.patch, 
> HBASE-18987.master.002.patch
>
>
> Short.MAX_VALUE hasn't been a problem for a long time but one of our 
> customers ran into an  edgy case when the midKey used for the split point was 
> very close to Short.MAX_VALUE. When the split is submitted, we attempt to 
> create the new two daughter regions and we name those regions via 
> {{HRegionInfo.createRegionName()}} in order to be added to META. 
> Unfortunately, since {{HRegionInfo.createRegionName()}} uses midKey as the 
> startKey {{Put}} will fail since the row key length will now fail checkRow 
> and thus causing the split to fail.
> I tried a couple of alternatives to address this problem, e.g. truncating the 
> startKey. But the number of changes in the code doesn't justify for this edge 
> condition. Since we already use {{Integer.MAX_VALUE - 1}} for 
> {{HConstants#MAXIMUM_VALUE_LENGTH}} it should be ok to use the same limit for 
> the maximum row key. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19309) Lower HConstants#MAX_ROW_LENGTH as guardrail in order to avoid HBASE-18987

2017-11-20 Thread Esteban Gutierrez (JIRA)
Esteban Gutierrez created HBASE-19309:
-

 Summary: Lower HConstants#MAX_ROW_LENGTH as guardrail in order to 
avoid HBASE-18987
 Key: HBASE-19309
 URL: https://issues.apache.org/jira/browse/HBASE-19309
 Project: HBase
  Issue Type: Bug
  Components: HFile, regionserver
Reporter: Esteban Gutierrez


As discussed in HBASE-18987. A problem of having a row about the maximum size 
of a row (Short.MAX_VALUE) is when a split happens, there is a possibility that 
the midkey could be that row and the Put created to add the new entry in META 
will exceed the maximum row size since the new row key will include the table 
name and that will cause the split to abort. Since is not possible to raise 
that row key size in HFileV3, a reasonable solution is to reduce the maximum 
size of row key in order to avoid exceeding Short.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18987) Raise value of HConstants#MAX_ROW_LENGTH

2017-11-20 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259848#comment-16259848
 ] 

Esteban Gutierrez commented on HBASE-18987:
---

After some discussion offline with [~mdrob] around the same comments from 
[~anoopsamjohn] the only approach to address this correctly is by having a new 
HFileV4 format without this kind of limitations. For now I'm going to create a 
new issue to add a guard rail to avoid accepting a key near to 
{{Short.MAX_VALUE}} in order to avoid triggering this problem.

> Raise value of HConstants#MAX_ROW_LENGTH
> 
>
> Key: HBASE-18987
> URL: https://issues.apache.org/jira/browse/HBASE-18987
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Minor
> Attachments: HBASE-18987.master.001.patch, 
> HBASE-18987.master.002.patch
>
>
> Short.MAX_VALUE hasn't been a problem for a long time but one of our 
> customers ran into an  edgy case when the midKey used for the split point was 
> very close to Short.MAX_VALUE. When the split is submitted, we attempt to 
> create the new two daughter regions and we name those regions via 
> {{HRegionInfo.createRegionName()}} in order to be added to META. 
> Unfortunately, since {{HRegionInfo.createRegionName()}} uses midKey as the 
> startKey {{Put}} will fail since the row key length will now fail checkRow 
> and thus causing the split to fail.
> I tried a couple of alternatives to address this problem, e.g. truncating the 
> startKey. But the number of changes in the code doesn't justify for this edge 
> condition. Since we already use {{Integer.MAX_VALUE - 1}} for 
> {{HConstants#MAXIMUM_VALUE_LENGTH}} it should be ok to use the same limit for 
> the maximum row key. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19030) nightly runs should attempt to log test results after archiving

2017-10-25 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218804#comment-16218804
 ] 

Esteban Gutierrez commented on HBASE-19030:
---

+1

> nightly runs should attempt to log test results after archiving
> ---
>
> Key: HBASE-19030
> URL: https://issues.apache.org/jira/browse/HBASE-19030
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
> Attachments: HBASE-19030.0.patch
>
>
> right now on the nightly tests the first post-action we do is log junit 
> results. due to current limitations of Jenkins DSL, if this step fails none 
> of the other post actions will happen.
> Since we might not make junit test results, e.g. in the case of a timeout of 
> yetus itself, we should log the junit results after we've saved whatever we 
> can of yetus output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18987) Raise value of HConstants#MAX_ROW_LENGTH

2017-10-12 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202225#comment-16202225
 ] 

Esteban Gutierrez commented on HBASE-18987:
---

[~mdrob], thats correct and there is no way to avoid that except avoiding keys 
that would have maxed out the row key length. Basically instead of checkRow() 
verifying if a row is larger than Short.MAX_VALUE we should verify if the row 
will blow the limitation of Short.MAX_VALUE + the region name overhead later on 
a split.

> Raise value of HConstants#MAX_ROW_LENGTH
> 
>
> Key: HBASE-18987
> URL: https://issues.apache.org/jira/browse/HBASE-18987
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Minor
> Attachments: HBASE-18987.master.001.patch, 
> HBASE-18987.master.002.patch
>
>
> Short.MAX_VALUE hasn't been a problem for a long time but one of our 
> customers ran into an  edgy case when the midKey used for the split point was 
> very close to Short.MAX_VALUE. When the split is submitted, we attempt to 
> create the new two daughter regions and we name those regions via 
> {{HRegionInfo.createRegionName()}} in order to be added to META. 
> Unfortunately, since {{HRegionInfo.createRegionName()}} uses midKey as the 
> startKey {{Put}} will fail since the row key length will now fail checkRow 
> and thus causing the split to fail.
> I tried a couple of alternatives to address this problem, e.g. truncating the 
> startKey. But the number of changes in the code doesn't justify for this edge 
> condition. Since we already use {{Integer.MAX_VALUE - 1}} for 
> {{HConstants#MAXIMUM_VALUE_LENGTH}} it should be ok to use the same limit for 
> the maximum row key. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18987) Raise value of HConstants#MAX_ROW_LENGTH

2017-10-12 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202042#comment-16202042
 ] 

Esteban Gutierrez commented on HBASE-18987:
---

bq. Not in testOversizedRegionNameForPut 
[~mdrob]: thanks!

bq. The value length can be upto Integer.MAX_VALUE - 1 as we use 4 bytes to 
store that. But for the row length it is 2 bytes right? Then allowing 
Integer.MAX_VALUE - 1 for RK length also correct?

[~anoopsamjohn]: yeah, you are right. The problem seems to run deeper: The 
KeyValue constructor accepts an integer for rlength but there are few more 
places where we only use a short: {{createEmptyByteArray}} will test if rlength 
is greater than {{Short.MAX_VALUE}} and {{rowLen}} on {{KeyOnlyKeyValue}} is a 
short. Also {{KEYVALUE_INFRASTRUCTURE_SIZE}} depends on ROW_LENGTH_SIZE which 
is {{Bytes.SIZEOF_SHORT}} My test didn't catch that since you need to go all 
the way to serialize the KV.

I think I'm -1 now for this and the truncate approach might be the only 
alternative for now.


> Raise value of HConstants#MAX_ROW_LENGTH
> 
>
> Key: HBASE-18987
> URL: https://issues.apache.org/jira/browse/HBASE-18987
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Minor
> Attachments: HBASE-18987.master.001.patch, 
> HBASE-18987.master.002.patch
>
>
> Short.MAX_VALUE hasn't been a problem for a long time but one of our 
> customers ran into an  edgy case when the midKey used for the split point was 
> very close to Short.MAX_VALUE. When the split is submitted, we attempt to 
> create the new two daughter regions and we name those regions via 
> {{HRegionInfo.createRegionName()}} in order to be added to META. 
> Unfortunately, since {{HRegionInfo.createRegionName()}} uses midKey as the 
> startKey {{Put}} will fail since the row key length will now fail checkRow 
> and thus causing the split to fail.
> I tried a couple of alternatives to address this problem, e.g. truncating the 
> startKey. But the number of changes in the code doesn't justify for this edge 
> condition. Since we already use {{Integer.MAX_VALUE - 1}} for 
> {{HConstants#MAXIMUM_VALUE_LENGTH}} it should be ok to use the same limit for 
> the maximum row key. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18987) Raise value of HConstants#MAX_ROW_LENGTH

2017-10-11 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201067#comment-16201067
 ] 

Esteban Gutierrez commented on HBASE-18987:
---

[~mdrob],  {{nameStr}} is being used.
{code}
String nameStr = Bytes.toString(name);
assertTrue(nameStr.length() <= HConstants.MAX_ROW_LENGTH);
{code}


> Raise value of HConstants#MAX_ROW_LENGTH
> 
>
> Key: HBASE-18987
> URL: https://issues.apache.org/jira/browse/HBASE-18987
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Minor
> Attachments: HBASE-18987.master.001.patch, 
> HBASE-18987.master.002.patch
>
>
> Short.MAX_VALUE hasn't been a problem for a long time but one of our 
> customers ran into an  edgy case when the midKey used for the split point was 
> very close to Short.MAX_VALUE. When the split is submitted, we attempt to 
> create the new two daughter regions and we name those regions via 
> {{HRegionInfo.createRegionName()}} in order to be added to META. 
> Unfortunately, since {{HRegionInfo.createRegionName()}} uses midKey as the 
> startKey {{Put}} will fail since the row key length will now fail checkRow 
> and thus causing the split to fail.
> I tried a couple of alternatives to address this problem, e.g. truncating the 
> startKey. But the number of changes in the code doesn't justify for this edge 
> condition. Since we already use {{Integer.MAX_VALUE - 1}} for 
> {{HConstants#MAXIMUM_VALUE_LENGTH}} it should be ok to use the same limit for 
> the maximum row key. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18987) Raise value of HConstants#MAX_ROW_LENGTH

2017-10-11 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-18987:
--
Attachment: HBASE-18987.master.002.patch

Yeah, the md5HashInHex stuff is remanent of the previous attempt. Also, 
addressed the test case suggestion. Thanks for the review [~mdrob]

> Raise value of HConstants#MAX_ROW_LENGTH
> 
>
> Key: HBASE-18987
> URL: https://issues.apache.org/jira/browse/HBASE-18987
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Minor
> Attachments: HBASE-18987.master.001.patch, 
> HBASE-18987.master.002.patch
>
>
> Short.MAX_VALUE hasn't been a problem for a long time but one of our 
> customers ran into an  edgy case when the midKey used for the split point was 
> very close to Short.MAX_VALUE. When the split is submitted, we attempt to 
> create the new two daughter regions and we name those regions via 
> {{HRegionInfo.createRegionName()}} in order to be added to META. 
> Unfortunately, since {{HRegionInfo.createRegionName()}} uses midKey as the 
> startKey {{Put}} will fail since the row key length will now fail checkRow 
> and thus causing the split to fail.
> I tried a couple of alternatives to address this problem, e.g. truncating the 
> startKey. But the number of changes in the code doesn't justify for this edge 
> condition. Since we already use {{Integer.MAX_VALUE - 1}} for 
> {{HConstants#MAXIMUM_VALUE_LENGTH}} it should be ok to use the same limit for 
> the maximum row key. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18987) Raise value of HConstants#MAX_ROW_LENGTH

2017-10-11 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-18987:
--
Attachment: HBASE-18987.master.001.patch

> Raise value of HConstants#MAX_ROW_LENGTH
> 
>
> Key: HBASE-18987
> URL: https://issues.apache.org/jira/browse/HBASE-18987
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Minor
> Attachments: HBASE-18987.master.001.patch
>
>
> Short.MAX_VALUE hasn't been a problem for a long time but one of our 
> customers ran into an  edgy case when the midKey used for the split point was 
> very close to Short.MAX_VALUE. When the split is submitted, we attempt to 
> create the new two daughter regions and we name those regions via 
> {{HRegionInfo.createRegionName()}} in order to be added to META. 
> Unfortunately, since {{HRegionInfo.createRegionName()}} uses midKey as the 
> startKey {{Put}} will fail since the row key length will now fail checkRow 
> and thus causing the split to fail.
> I tried a couple of alternatives to address this problem, e.g. truncating the 
> startKey. But the number of changes in the code doesn't justify for this edge 
> condition. Since we already use {{Integer.MAX_VALUE - 1}} for 
> {{HConstants#MAXIMUM_VALUE_LENGTH}} it should be ok to use the same limit for 
> the maximum row key. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18987) Raise value of HConstants#MAX_ROW_LENGTH

2017-10-11 Thread Esteban Gutierrez (JIRA)
Esteban Gutierrez created HBASE-18987:
-

 Summary: Raise value of HConstants#MAX_ROW_LENGTH
 Key: HBASE-18987
 URL: https://issues.apache.org/jira/browse/HBASE-18987
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.0.0, 2.0.0
Reporter: Esteban Gutierrez
Assignee: Esteban Gutierrez
Priority: Minor


Short.MAX_VALUE hasn't been a problem for a long time but one of our customers 
ran into an  edgy case when the midKey used for the split point was very close 
to Short.MAX_VALUE. When the split is submitted, we attempt to create the new 
two daughter regions and we name those regions via 
{{HRegionInfo.createRegionName()}} in order to be added to META. Unfortunately, 
since {{HRegionInfo.createRegionName()}} uses midKey as the startKey {{Put}} 
will fail since the row key length will now fail checkRow and thus causing the 
split to fail.

I tried a couple of alternatives to address this problem, e.g. truncating the 
startKey. But the number of changes in the code doesn't justify for this edge 
condition. Since we already use {{Integer.MAX_VALUE - 1}} for 
{{HConstants#MAXIMUM_VALUE_LENGTH}} it should be ok to use the same limit for 
the maximum row key. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17799) HBCK region boundaries check can return false negatives when IOExceptions are thrown

2017-09-27 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16183169#comment-16183169
 ] 

Esteban Gutierrez commented on HBASE-17799:
---

[~mdrob] sure, let me work on that. Thanks!

> HBCK region boundaries check can return false negatives when IOExceptions are 
> thrown
> 
>
> Key: HBASE-17799
> URL: https://issues.apache.org/jira/browse/HBASE-17799
> Project: HBase
>  Issue Type: Bug
>  Components: hbck
>Affects Versions: 2.0.0, 1.4.0, 1.3.1, 1.2.5
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
> Attachments: HBASE-17799.master.001.patch, 
> HBASE-17799.master.002.patch
>
>
> When enabled, HBaseFsck#checkRegionBoundaries will crawl all HFiles across 
> all namespaces and tables when {{-boundaries}} is specified. However if an 
> IOException is thrown by accessing a corrupt HFile, an un-handled HLink or by 
> any other reason, we will only log the exception and stop crawling the HFiles 
> and potentially reporting the wrong result.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16478) Rename WALKey in PB to WALEdit

2017-09-11 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16161816#comment-16161816
 ] 

Esteban Gutierrez commented on HBASE-16478:
---

If [~enis] says ok, then it should be ok. I was more concerned how 
{{WAL.Entry}} would be impacted with replication between branch-1 and branch-2. 

> Rename WALKey in PB to WALEdit
> --
>
> Key: HBASE-16478
> URL: https://issues.apache.org/jira/browse/HBASE-16478
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 2.0.0
>
> Attachments: HBASE-16478.master.001.patch, 
> HBASE-16478.master.001.patch, hbase-16478_v1.patch
>
>
> As per title. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18675) Making {max,min}SessionTimeout configurable for MiniZooKeeperCluster

2017-08-24 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16140671#comment-16140671
 ] 

Esteban Gutierrez commented on HBASE-18675:
---

Thanks [~beettlle]. Maybe check if tickTime was modified and scale accordingly? 
In {{HMasterCommandLine#startMaster}} we have the option to set setTickTime 
when we start hbase in {{local}} mode (embedded ZK) so it might cause some 
surprises for users that might have changed the tickTime to a different value.

> Making {max,min}SessionTimeout configurable for MiniZooKeeperCluster
> 
>
> Key: HBASE-18675
> URL: https://issues.apache.org/jira/browse/HBASE-18675
> Project: HBase
>  Issue Type: Bug
>  Components: Zookeeper
>Reporter: Cesar Delgado
>Assignee: Cesar Delgado
>Priority: Minor
> Attachments: MiniZooKeeperCluster_HBASE_8675.patch, 
> MiniZooKeeperCluster_HBASE_8675.patch
>
>
> Right now the mini cluster on application developers laptops keep crashing 
> when the laptop goes to sleep because Zookeeper times out. We've seen this 
> for a while and [~ekoontz] had worked on it before. Now that we tried to 
> upgrade it's bitten us so we'd like to push this up as I'm sure we're not the 
> only ones getting bitten by this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18675) Making {max,min}SessionTimeout configurable for MiniZooKeeperCluster

2017-08-24 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16140631#comment-16140631
 ] 

Esteban Gutierrez commented on HBASE-18675:
---

As [~tedyu] mentioned, the values should be different. Also, keep in mind that 
it might be a good idea to use the ZK defaults instead of a random value. 
Thanks for working on this [~beettlle]

> Making {max,min}SessionTimeout configurable for MiniZooKeeperCluster
> 
>
> Key: HBASE-18675
> URL: https://issues.apache.org/jira/browse/HBASE-18675
> Project: HBase
>  Issue Type: Bug
>  Components: Zookeeper
>Reporter: Cesar Delgado
>Assignee: Cesar Delgado
>Priority: Minor
> Attachments: MiniZooKeeperCluster_HBASE_8675.patch
>
>
> Right now the mini cluster on application developers laptops keep crashing 
> when the laptop goes to sleep because Zookeeper times out. We've seen this 
> for a while and [~ekoontz] had worked on it before. Now that we tried to 
> upgrade it's bitten us so we'd like to push this up as I'm sure we're not the 
> only ones getting bitten by this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18596) A hbase1 cluster should be able to replicate to a hbase2 cluster; verify

2017-08-23 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138717#comment-16138717
 ] 

Esteban Gutierrez commented on HBASE-18596:
---

Initial run from PE to replicate from HBase 1.2 to HBase 2.0 worked so far and 
replication from HBase 2.0 to HBase 1.2 also worked without any major issue. I 
will try to code this into a proper it test in IntegrationTestReplication.

> A hbase1 cluster should be able to replicate to a hbase2 cluster; verify
> 
>
> Key: HBASE-18596
> URL: https://issues.apache.org/jira/browse/HBASE-18596
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: Esteban Gutierrez
>Priority: Blocker
> Fix For: 2.0.0-alpha-3
>
>
> From the mailing list thread "[DISCUSS] hbase-2.0.0 compatibility 
> expectations", [~esteban] asks:
> bq. Should we add additional details around replication as well? for 
> instance, shall we consider a hbase-1.x cluster as a client for a hbase-2.x 
> cluster?
> The latter should be a blocker. Verify it works.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HBASE-18596) A hbase1 cluster should be able to replicate to a hbase2 cluster; verify

2017-08-15 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez reassigned HBASE-18596:
-

Assignee: Esteban Gutierrez

> A hbase1 cluster should be able to replicate to a hbase2 cluster; verify
> 
>
> Key: HBASE-18596
> URL: https://issues.apache.org/jira/browse/HBASE-18596
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: Esteban Gutierrez
>Priority: Blocker
> Fix For: 2.0.0
>
>
> From the mailing list thread "[DISCUSS] hbase-2.0.0 compatibility 
> expectations", [~esteban] asks:
> bq. Should we add additional details around replication as well? for 
> instance, shall we consider a hbase-1.x cluster as a client for a hbase-2.x 
> cluster?
> The latter should be a blocker. Verify it works.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18596) A hbase1 cluster should be able to replicate to a hbase2 cluster; verify

2017-08-15 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127785#comment-16127785
 ] 

Esteban Gutierrez commented on HBASE-18596:
---

Will work on this.

> A hbase1 cluster should be able to replicate to a hbase2 cluster; verify
> 
>
> Key: HBASE-18596
> URL: https://issues.apache.org/jira/browse/HBASE-18596
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: Esteban Gutierrez
>Priority: Blocker
> Fix For: 2.0.0
>
>
> From the mailing list thread "[DISCUSS] hbase-2.0.0 compatibility 
> expectations", [~esteban] asks:
> bq. Should we add additional details around replication as well? for 
> instance, shall we consider a hbase-1.x cluster as a client for a hbase-2.x 
> cluster?
> The latter should be a blocker. Verify it works.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18025) CatalogJanitor should collect outdated RegionStates from the AM

2017-08-11 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-18025:
--
   Resolution: Fixed
Fix Version/s: 1.5.0
   3.0.0
   2.0.0
   Status: Resolved  (was: Patch Available)

Thanks

> CatalogJanitor should collect outdated RegionStates from the AM
> ---
>
> Key: HBASE-18025
> URL: https://issues.apache.org/jira/browse/HBASE-18025
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
> Fix For: 2.0.0, 3.0.0, 1.5.0
>
> Attachments: HBASE-18025.001.patch, HBASE-18025.002.patch, 
> HBASE-18025.003.patch, HBASE-18025.004.patch, HBASE-18025.005.patch, 
> HBASE-18025-branch-1.005.patch, HBASE-18025-branch-1.006.patch
>
>
> I don't think this will matter on the long run for HBase 2, but at least in 
> branch-1 and the current master we keep in multiple places copies of the 
> region states in the master and this copies include information like the HRI. 
> A problem that we have observed is when region replicas are being used and 
> there is a split, the region replica from parent doesn't get collected from 
> the region states and when the balancer tries to assign the old parent region 
> replica, this will cause the RegionServer to create a new HRI with the 
> details of the parent causing an inconstancy (see HBASE-18024).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18025) CatalogJanitor should collect outdated RegionStates from the AM

2017-08-11 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-18025:
--
Attachment: HBASE-18025-branch-1.006.patch

> CatalogJanitor should collect outdated RegionStates from the AM
> ---
>
> Key: HBASE-18025
> URL: https://issues.apache.org/jira/browse/HBASE-18025
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
> Attachments: HBASE-18025.001.patch, HBASE-18025.002.patch, 
> HBASE-18025.003.patch, HBASE-18025.004.patch, HBASE-18025.005.patch, 
> HBASE-18025-branch-1.005.patch, HBASE-18025-branch-1.006.patch
>
>
> I don't think this will matter on the long run for HBase 2, but at least in 
> branch-1 and the current master we keep in multiple places copies of the 
> region states in the master and this copies include information like the HRI. 
> A problem that we have observed is when region replicas are being used and 
> there is a split, the region replica from parent doesn't get collected from 
> the region states and when the balancer tries to assign the old parent region 
> replica, this will cause the RegionServer to create a new HRI with the 
> details of the parent causing an inconstancy (see HBASE-18024).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18025) CatalogJanitor should collect outdated RegionStates from the AM

2017-08-11 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123849#comment-16123849
 ] 

Esteban Gutierrez commented on HBASE-18025:
---

Yes, will upload a new version. Thanks for the review boss.

> CatalogJanitor should collect outdated RegionStates from the AM
> ---
>
> Key: HBASE-18025
> URL: https://issues.apache.org/jira/browse/HBASE-18025
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
> Attachments: HBASE-18025.001.patch, HBASE-18025.002.patch, 
> HBASE-18025.003.patch, HBASE-18025.004.patch, HBASE-18025.005.patch, 
> HBASE-18025-branch-1.005.patch
>
>
> I don't think this will matter on the long run for HBase 2, but at least in 
> branch-1 and the current master we keep in multiple places copies of the 
> region states in the master and this copies include information like the HRI. 
> A problem that we have observed is when region replicas are being used and 
> there is a split, the region replica from parent doesn't get collected from 
> the region states and when the balancer tries to assign the old parent region 
> replica, this will cause the RegionServer to create a new HRI with the 
> details of the parent causing an inconstancy (see HBASE-18024).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18025) CatalogJanitor should collect outdated RegionStates from the AM

2017-08-11 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-18025:
--
Attachment: HBASE-18025-branch-1.005.patch

Test failures seem unrelated, uploading patch for branch-1.

> CatalogJanitor should collect outdated RegionStates from the AM
> ---
>
> Key: HBASE-18025
> URL: https://issues.apache.org/jira/browse/HBASE-18025
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
> Attachments: HBASE-18025.001.patch, HBASE-18025.002.patch, 
> HBASE-18025.003.patch, HBASE-18025.004.patch, HBASE-18025.005.patch, 
> HBASE-18025-branch-1.005.patch
>
>
> I don't think this will matter on the long run for HBase 2, but at least in 
> branch-1 and the current master we keep in multiple places copies of the 
> region states in the master and this copies include information like the HRI. 
> A problem that we have observed is when region replicas are being used and 
> there is a split, the region replica from parent doesn't get collected from 
> the region states and when the balancer tries to assign the old parent region 
> replica, this will cause the RegionServer to create a new HRI with the 
> details of the parent causing an inconstancy (see HBASE-18024).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18025) CatalogJanitor should collect outdated RegionStates from the AM

2017-08-11 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-18025:
--
Attachment: HBASE-18025.005.patch

Thanks [~tedyu]. Attached a new patch, please let me know if you have further 
comments.

> CatalogJanitor should collect outdated RegionStates from the AM
> ---
>
> Key: HBASE-18025
> URL: https://issues.apache.org/jira/browse/HBASE-18025
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
> Attachments: HBASE-18025.001.patch, HBASE-18025.002.patch, 
> HBASE-18025.003.patch, HBASE-18025.004.patch, HBASE-18025.005.patch
>
>
> I don't think this will matter on the long run for HBase 2, but at least in 
> branch-1 and the current master we keep in multiple places copies of the 
> region states in the master and this copies include information like the HRI. 
> A problem that we have observed is when region replicas are being used and 
> there is a split, the region replica from parent doesn't get collected from 
> the region states and when the balancer tries to assign the old parent region 
> replica, this will cause the RegionServer to create a new HRI with the 
> details of the parent causing an inconstancy (see HBASE-18024).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18025) CatalogJanitor should collect outdated RegionStates from the AM

2017-08-11 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-18025:
--
Attachment: HBASE-18025.004.patch

New patch with a better test to verify that in memory states have been removed.

> CatalogJanitor should collect outdated RegionStates from the AM
> ---
>
> Key: HBASE-18025
> URL: https://issues.apache.org/jira/browse/HBASE-18025
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
> Attachments: HBASE-18025.001.patch, HBASE-18025.002.patch, 
> HBASE-18025.003.patch, HBASE-18025.004.patch
>
>
> I don't think this will matter on the long run for HBase 2, but at least in 
> branch-1 and the current master we keep in multiple places copies of the 
> region states in the master and this copies include information like the HRI. 
> A problem that we have observed is when region replicas are being used and 
> there is a split, the region replica from parent doesn't get collected from 
> the region states and when the balancer tries to assign the old parent region 
> replica, this will cause the RegionServer to create a new HRI with the 
> details of the parent causing an inconstancy (see HBASE-18024).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18563) Fix RAT License complaint about website jenkins scripts

2017-08-10 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-18563:
--
Status: Patch Available  (was: Open)

> Fix RAT License complaint about website jenkins scripts
> ---
>
> Key: HBASE-18563
> URL: https://issues.apache.org/jira/browse/HBASE-18563
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Priority: Trivial
> Attachments: HBASE-18563.0001.patch
>
>
> {{2 Unknown Licenses
> *
> Files with unapproved licenses:
>   dev-support/jenkins-scripts/check-website-links.sh
>   dev-support/jenkins-scripts/generate-hbase-website.sh
> *
> }}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18563) Fix RAT License complaint about website jenkins scripts

2017-08-10 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-18563:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

trivial fix, committed to master.

> Fix RAT License complaint about website jenkins scripts
> ---
>
> Key: HBASE-18563
> URL: https://issues.apache.org/jira/browse/HBASE-18563
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Trivial
> Attachments: HBASE-18563.0001.patch
>
>
> {{2 Unknown Licenses
> *
> Files with unapproved licenses:
>   dev-support/jenkins-scripts/check-website-links.sh
>   dev-support/jenkins-scripts/generate-hbase-website.sh
> *
> }}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18563) Fix RAT License complaint about website jenkins scripts

2017-08-10 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-18563:
--
Attachment: HBASE-18563.0001.patch

> Fix RAT License complaint about website jenkins scripts
> ---
>
> Key: HBASE-18563
> URL: https://issues.apache.org/jira/browse/HBASE-18563
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HBASE-18563.0001.patch
>
>
> {{2 Unknown Licenses
> *
> Files with unapproved licenses:
>   dev-support/jenkins-scripts/check-website-links.sh
>   dev-support/jenkins-scripts/generate-hbase-website.sh
> *
> }}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18563) Fix RAT License complaint about website jenkins scripts

2017-08-10 Thread Esteban Gutierrez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esteban Gutierrez updated HBASE-18563:
--
Fix Version/s: 3.0.0

> Fix RAT License complaint about website jenkins scripts
> ---
>
> Key: HBASE-18563
> URL: https://issues.apache.org/jira/browse/HBASE-18563
> Project: HBase
>  Issue Type: Bug
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HBASE-18563.0001.patch
>
>
> {{2 Unknown Licenses
> *
> Files with unapproved licenses:
>   dev-support/jenkins-scripts/check-website-links.sh
>   dev-support/jenkins-scripts/generate-hbase-website.sh
> *
> }}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   3   4   5   6   7   8   9   >