[jira] [Updated] (HBASE-25582) Support setting scan ReadType to be STREAM at cluster level

2021-02-16 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-25582:
---
Fix Version/s: 2.5.0
   3.0.0-alpha-1

> Support setting scan ReadType to be STREAM at cluster level
> ---
>
> Key: HBASE-25582
> URL: https://issues.apache.org/jira/browse/HBASE-25582
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> We have the config 'hbase.storescanner.use.pread' at cluster level to set 
> ReadType to be PRead if not explicitly specified in Scan object.
> Same way we can have a way to make scan as STREAM type at cluster level (if 
> not specified at Scan object level)
> We do not need any new configs or so.  We have the config 
> 'hbase.storescanner.pread.max.bytes' which specifies when to switch read type 
> to stream and it defaults to 4 * HFile block size.  If one config this value 
> as <= 0 means user need the switch when scanner is created itself.  With such 
> a handling we can support it.
> So every scan need not set the read type.
> The issue is in Cloud storage based system using Stream reads might be 
> better.  We introduced this PRead based scan with tests on HDFS based 
> storage.   In my customer case, Azure storage in place and WASB driver been 
> used. We have a read ahead mechanism there (Read an entire Block of a blob in 
> one REST call) and buffer that in WASB driver.  This helps a lot wrt longer 
> scans.   Ya with config 'hbase.storescanner.pread.max.bytes'  we can make the 
> switch to happen early but better to go with 1.x way where the scan starts 
> with Stream read itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25582) Support setting scan ReadType to be STREAM at cluster level

2021-02-16 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-25582:
---
Affects Version/s: 2.0.0

> Support setting scan ReadType to be STREAM at cluster level
> ---
>
> Key: HBASE-25582
> URL: https://issues.apache.org/jira/browse/HBASE-25582
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
>Priority: Major
>
> We have the config 'hbase.storescanner.use.pread' at cluster level to set 
> ReadType to be PRead if not explicitly specified in Scan object.
> Same way we can have a way to make scan as STREAM type at cluster level (if 
> not specified at Scan object level)
> We do not need any new configs or so.  We have the config 
> 'hbase.storescanner.pread.max.bytes' which specifies when to switch read type 
> to stream and it defaults to 4 * HFile block size.  If one config this value 
> as <= 0 means user need the switch when scanner is created itself.  With such 
> a handling we can support it.
> So every scan need not set the read type.
> The issue is in Cloud storage based system using Stream reads might be 
> better.  We introduced this PRead based scan with tests on HDFS based 
> storage.   In my customer case, Azure storage in place and WASB driver been 
> used. We have a read ahead mechanism there (Read an entire Block of a blob in 
> one REST call) and buffer that in WASB driver.  This helps a lot wrt longer 
> scans.   Ya with config 'hbase.storescanner.pread.max.bytes'  we can make the 
> switch to happen early but better to go with 1.x way where the scan starts 
> with Stream read itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25582) Support setting scan ReadType to be STREAM at cluster level

2021-02-16 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-25582:
--

 Summary: Support setting scan ReadType to be STREAM at cluster 
level
 Key: HBASE-25582
 URL: https://issues.apache.org/jira/browse/HBASE-25582
 Project: HBase
  Issue Type: Improvement
Reporter: Anoop Sam John
Assignee: Anoop Sam John


We have the config 'hbase.storescanner.use.pread' at cluster level to set 
ReadType to be PRead if not explicitly specified in Scan object.
Same way we can have a way to make scan as STREAM type at cluster level (if not 
specified at Scan object level)
We do not need any new configs or so.  We have the config 
'hbase.storescanner.pread.max.bytes' which specifies when to switch read type 
to stream and it defaults to 4 * HFile block size.  If one config this value as 
<= 0 means user need the switch when scanner is created itself.  With such a 
handling we can support it.
So every scan need not set the read type.

The issue is in Cloud storage based system using Stream reads might be better.  
We introduced this PRead based scan with tests on HDFS based storage.   In my 
customer case, Azure storage in place and WASB driver been used. We have a read 
ahead mechanism there (Read an entire Block of a blob in one REST call) and 
buffer that in WASB driver.  This helps a lot wrt longer scans.   Ya with 
config 'hbase.storescanner.pread.max.bytes'  we can make the switch to happen 
early but better to go with 1.x way where the scan starts with Stream read 
itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-25540) ArrayIndexOutOfBoundsException thrown when table CF name is "#"

2021-02-15 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285039#comment-17285039
 ] 

Anoop Sam John edited comment on HBASE-25540 at 2/16/21, 5:25 AM:
--

There is a way to solve this.   pls see 
[https://stackoverflow.com/questions/18677762/handling-delimiter-with-escape-characters-in-java-string-split-method]

cc [~ram_krish]


was (Author: anoop.hbase):
There is a way to solve this.   pls see 
https://stackoverflow.com/questions/18677762/handling-delimiter-with-escape-characters-in-java-string-split-method

> ArrayIndexOutOfBoundsException thrown when table CF name is "#"
> ---
>
> Key: HBASE-25540
> URL: https://issues.apache.org/jira/browse/HBASE-25540
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Minor
>
> ArrayIndexOutOfBoundsException will be thrown when CF name is "#",
> https://github.com/apache/hbase/blob/a04ea7ea4493f5bc583b4d08a2a6a88e7c6b8c54/hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsTableSourceImpl.java#L340
> {noformat}
> 2021-01-30 00:11:14,172 | ERROR | HBase-Metrics2-1 | Error getting metrics 
> from source RegionServer,sub=Tables | 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:202)
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hbase.regionserver.MetricsTableSourceImpl.addGauge(MetricsTableSourceImpl.java:336)
> at 
> org.apache.hadoop.hbase.regionserver.MetricsTableSourceImpl.snapshot(MetricsTableSourceImpl.java:321)
> at 
> org.apache.hadoop.hbase.regionserver.MetricsTableAggregateSourceImpl.getMetrics(MetricsTableAggregateSourceImpl.java:98)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:183)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:156)
> at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333)
> at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319)
> at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
> at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:67)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:222)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:101)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:268)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:239)
> at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:324)
> at com.sun.proxy.$Proxy7.postStart(Unknown Source)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:193)
> at 
> org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:109)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25540) ArrayIndexOutOfBoundsException thrown when table CF name is "#"

2021-02-15 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285039#comment-17285039
 ] 

Anoop Sam John commented on HBASE-25540:


There is a way to solve this.   pls see 
https://stackoverflow.com/questions/18677762/handling-delimiter-with-escape-characters-in-java-string-split-method

> ArrayIndexOutOfBoundsException thrown when table CF name is "#"
> ---
>
> Key: HBASE-25540
> URL: https://issues.apache.org/jira/browse/HBASE-25540
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Minor
>
> ArrayIndexOutOfBoundsException will be thrown when CF name is "#",
> https://github.com/apache/hbase/blob/a04ea7ea4493f5bc583b4d08a2a6a88e7c6b8c54/hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsTableSourceImpl.java#L340
> {noformat}
> 2021-01-30 00:11:14,172 | ERROR | HBase-Metrics2-1 | Error getting metrics 
> from source RegionServer,sub=Tables | 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:202)
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hbase.regionserver.MetricsTableSourceImpl.addGauge(MetricsTableSourceImpl.java:336)
> at 
> org.apache.hadoop.hbase.regionserver.MetricsTableSourceImpl.snapshot(MetricsTableSourceImpl.java:321)
> at 
> org.apache.hadoop.hbase.regionserver.MetricsTableAggregateSourceImpl.getMetrics(MetricsTableAggregateSourceImpl.java:98)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:183)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:156)
> at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333)
> at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319)
> at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
> at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:67)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:222)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:101)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:268)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:239)
> at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:324)
> at com.sun.proxy.$Proxy7.postStart(Unknown Source)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:193)
> at 
> org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:109)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25540) ArrayIndexOutOfBoundsException thrown when table CF name is "#"

2021-02-15 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285037#comment-17285037
 ] 

Anoop Sam John commented on HBASE-25540:


This trace will get logged only or will it cause some other issues in RS?

> ArrayIndexOutOfBoundsException thrown when table CF name is "#"
> ---
>
> Key: HBASE-25540
> URL: https://issues.apache.org/jira/browse/HBASE-25540
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Minor
>
> ArrayIndexOutOfBoundsException will be thrown when CF name is "#",
> https://github.com/apache/hbase/blob/a04ea7ea4493f5bc583b4d08a2a6a88e7c6b8c54/hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsTableSourceImpl.java#L340
> {noformat}
> 2021-01-30 00:11:14,172 | ERROR | HBase-Metrics2-1 | Error getting metrics 
> from source RegionServer,sub=Tables | 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:202)
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hbase.regionserver.MetricsTableSourceImpl.addGauge(MetricsTableSourceImpl.java:336)
> at 
> org.apache.hadoop.hbase.regionserver.MetricsTableSourceImpl.snapshot(MetricsTableSourceImpl.java:321)
> at 
> org.apache.hadoop.hbase.regionserver.MetricsTableAggregateSourceImpl.getMetrics(MetricsTableAggregateSourceImpl.java:98)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:183)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:156)
> at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333)
> at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319)
> at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
> at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:67)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:222)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:101)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:268)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:239)
> at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:324)
> at com.sun.proxy.$Proxy7.postStart(Unknown Source)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:193)
> at 
> org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:109)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25541) In WALEntryStream, set the current path to null while dequeing the log

2021-02-15 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284743#comment-17284743
 ] 

Anoop Sam John commented on HBASE-25541:


More context info pls.

> In WALEntryStream, set the current path to null while dequeing the log
> --
>
> Key: HBASE-25541
> URL: https://issues.apache.org/jira/browse/HBASE-25541
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.6.0, 1.7.0, 1.8.0
>Reporter: Sandeep Pal
>Assignee: Sandeep Pal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25540) ArrayIndexOutOfBoundsException thrown when table CF name is "#"

2021-02-14 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284545#comment-17284545
 ] 

Anoop Sam John commented on HBASE-25540:


This is a valid CF name?  As such any byte[] we will support !

> ArrayIndexOutOfBoundsException thrown when table CF name is "#"
> ---
>
> Key: HBASE-25540
> URL: https://issues.apache.org/jira/browse/HBASE-25540
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Minor
>
> ArrayIndexOutOfBoundsException will be thrown when CF name is "#",
> https://github.com/apache/hbase/blob/a04ea7ea4493f5bc583b4d08a2a6a88e7c6b8c54/hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsTableSourceImpl.java#L340
> {noformat}
> 2021-01-30 00:11:14,172 | ERROR | HBase-Metrics2-1 | Error getting metrics 
> from source RegionServer,sub=Tables | 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:202)
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hbase.regionserver.MetricsTableSourceImpl.addGauge(MetricsTableSourceImpl.java:336)
> at 
> org.apache.hadoop.hbase.regionserver.MetricsTableSourceImpl.snapshot(MetricsTableSourceImpl.java:321)
> at 
> org.apache.hadoop.hbase.regionserver.MetricsTableAggregateSourceImpl.getMetrics(MetricsTableAggregateSourceImpl.java:98)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:183)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:156)
> at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333)
> at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319)
> at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
> at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:67)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:222)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:101)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:268)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:239)
> at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:324)
> at com.sun.proxy.$Proxy7.postStart(Unknown Source)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:193)
> at 
> org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:109)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24900) Make retain assignment configurable during SCP

2021-01-29 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275469#comment-17275469
 ] 

Anoop Sam John commented on HBASE-24900:


+1

> Make retain assignment configurable during SCP
> --
>
> Key: HBASE-24900
> URL: https://issues.apache.org/jira/browse/HBASE-24900
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 3.0.0-alpha-1, 2.3.1, 2.1.9, 2.2.5
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> HBASE-23035 change the "retain" assignment to round-robin assignment during 
> SCP which will make the failover faster and surely improve the availability, 
> but this will impact the scan performance in non-cloud scenario.
> This jira will make this assignment plan configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25446) public static method MultiTableHFileOutputFormat.configureIncrementalLoad actually cannot be invoked outside hbase package

2021-01-06 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17260187#comment-17260187
 ] 

Anoop Sam John commented on HBASE-25446:


Seems no alternative and needs a fix. 

> public static method MultiTableHFileOutputFormat.configureIncrementalLoad 
> actually cannot be invoked outside hbase package
> --
>
> Key: HBASE-25446
> URL: https://issues.apache.org/jira/browse/HBASE-25446
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 2.4.0
>Reporter: Zhou Yuliang
>Priority: Major
>
> *MultiTableHFileOutputFormat* provides a public static method 
> _configureIncrementalLoad_ to configure reducer for multi table hfile output. 
> See this. 
> [https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableHFileOutputFormat.java#L92]
> However the *TableInfo* in the parameter is NOT a public class, which makes 
> it impossible to invoke 
> *MultiTableHFileOutputFormat*._configureIncrementalLoad_ in other project. 
> Is there any alternative for the use case? For now I have to create another 
> class in +org.apache.hadoop.hbase.mapreduce+ package to invoke the 
> *MultiTableHFileOutputFormat*._configureIncrementalLoad._



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25287) Forgetting to unbuffer streams results in many CLOSE_WAIT sockets when loading files

2021-01-06 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17260183#comment-17260183
 ] 

Anoop Sam John commented on HBASE-25287:


You mean separate jira for backport to branch-1?  This jira having all branch-2 
based versions as fixed version already

> Forgetting to unbuffer streams results in many CLOSE_WAIT sockets when 
> loading files
> 
>
> Key: HBASE-25287
> URL: https://issues.apache.org/jira/browse/HBASE-25287
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.4, 2.5.0, 2.4.1
>
> Attachments: 1605328358304-image.png, 1605328417888-image.png, 
> 1605504914256-image.png
>
>
> HBASE-9393 found seek+read will leave many CLOSE_WAIT sockets without stream 
> unbuffer, which can free sockets and file descriptors held by the stream. 
> In our cluster RSes with about one hundred thousand store files, we found the 
> number of  CLOSE_WAIT sockets increases with the number of regions opened, 
> and can up to the operating system open files limit 100.
>  
> {code:java}
> 2020-11-12 20:19:02,452 WARN  [1282990092@qtp-220038608-1 - Acceptor0 
> SelectChannelConnector@0.0.0.0:16030] mortbay.log: EXCEPTION
> java.io.IOException: Too many open files
>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>         at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
>         at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
>         at 
> org.mortbay.jetty.nio.SelectChannelConnector$1.acceptChannel(SelectChannelConnector.java:75)
>         at 
> org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:686)
>         at 
> org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:192)
>         at 
> org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
>         at 
> org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:708)
>         at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> {code}
>  
> {code:java}
> [hbase@gha-data-hbase-cat0053 hbase]$ ulimit -SHn
> 100
> {code}
>  
>  
> The reason of the problem is, when store file opened, 
> {code:java}
> private void open() throws IOException {
>   fileInfo.initHDFSBlocksDistribution();
>   long readahead = fileInfo.isNoReadahead() ? 0L : -1L;
>   ReaderContext context = fileInfo.createReaderContext(false, readahead, 
> ReaderType.PREAD);
>   fileInfo.initHFileInfo(context);
>   StoreFileReader reader = fileInfo.preStoreFileReaderOpen(context, 
> cacheConf);
>   if (reader == null) {
> reader = fileInfo.createReader(context, cacheConf);
> fileInfo.getHFileInfo().initMetaAndIndex(reader.getHFileReader());
>   }
> {code}
> only createReader() unbuffered the stream. In initMetaAndIndex(), using the 
> stream to read blocks, so it needs to unbuffer() the socket , too.
> We can just add try before fileInfo.initHFileInfo(context); and finally 
> unbuffer() the stream at the end of the open() function.
> We fixed it on our cluster, the number of CLOSE_WAIT reduced to about 0. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25445) Old WALs archive fails in procedure based WAL split

2021-01-06 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-25445:
---
Summary: Old WALs archive fails in procedure based WAL split  (was: Old 
WALs archive fails in procedure based WAL)

> Old WALs archive fails in procedure based WAL split
> ---
>
> Key: HBASE-25445
> URL: https://issues.apache.org/jira/browse/HBASE-25445
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0, 2.2.6, 2.3.2
>Reporter: mokai
>Assignee: Anjan Das
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.3.4, 2.5.0, 2.4.1
>
> Attachments: ServerCrashWrongFSError.png
>
>
> If 'hbase.wal.dir' and 'hbase.rootdir' are configured to different 
> filesystem, SplitWALRemoteProcedure archived split WAL failed since 
> SplitWALManager using wrong fs instance. SplitWALManager should use WAL 
> corresponding fs instance.
> Steps to Reproduce:
>  * Configure 'hbase.wal.dir' and 'hbase.rootdir' so that they point to 
> different fs instances.
>  * Start HBase with multiple RS. 
>  * Create a couple of tables and some rows in them so that the RSs get 
> assigned with some regions. 
>  * Take any RS with non-zero number of regions offline. 
>  * Check master logs for "Wrong FS" error as shown in the screenshot 
> attached. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25445) Old WALs archive fails in procedure based WAL

2021-01-06 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-25445:
---
Summary: Old WALs archive fails in procedure based WAL  (was: 
SplitWALRemoteProcedure failed to archive split WAL)

> Old WALs archive fails in procedure based WAL
> -
>
> Key: HBASE-25445
> URL: https://issues.apache.org/jira/browse/HBASE-25445
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0, 2.2.6, 2.3.2
>Reporter: mokai
>Assignee: Anjan Das
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.3.4, 2.5.0, 2.4.1
>
> Attachments: ServerCrashWrongFSError.png
>
>
> If 'hbase.wal.dir' and 'hbase.rootdir' are configured to different 
> filesystem, SplitWALRemoteProcedure archived split WAL failed since 
> SplitWALManager using wrong fs instance. SplitWALManager should use WAL 
> corresponding fs instance.
> Steps to Reproduce:
>  * Configure 'hbase.wal.dir' and 'hbase.rootdir' so that they point to 
> different fs instances.
>  * Start HBase with multiple RS. 
>  * Create a couple of tables and some rows in them so that the RSs get 
> assigned with some regions. 
>  * Take any RS with non-zero number of regions offline. 
>  * Check master logs for "Wrong FS" error as shown in the screenshot 
> attached. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25445) SplitWALRemoteProcedure failed to archive split WAL

2021-01-06 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259537#comment-17259537
 ] 

Anoop Sam John commented on HBASE-25445:


It is this way from the time Proc based WAL split is introduced?  Can you pls 
update the Affected versions accordingly?  

> SplitWALRemoteProcedure failed to archive split WAL
> ---
>
> Key: HBASE-25445
> URL: https://issues.apache.org/jira/browse/HBASE-25445
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.2.3, 2.4.1
>Reporter: mokai
>Assignee: Anjan Das
>Priority: Critical
> Attachments: ServerCrashWrongFSError.png
>
>
> If 'hbase.wal.dir' and 'hbase.rootdir' are configured to different 
> filesystem, SplitWALRemoteProcedure archived split WAL failed since 
> SplitWALManager using wrong fs instance. SplitWALManager should use WAL 
> corresponding fs instance.
> Steps to Reproduce:
>  * Configure 'hbase.wal.dir' and 'hbase.rootdir' so that they point to 
> different fs instances.
>  * Start HBase with multiple RS. 
>  * Create a couple of tables and some rows in them so that the RSs get 
> assigned with some regions. 
>  * Take any RS with non-zero number of regions offline. 
>  * Check master logs for "Wrong FS" error as shown in the screenshot 
> attached. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25445) SplitWALRemoteProcedure failed to archive split WAL

2021-01-05 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259419#comment-17259419
 ] 

Anoop Sam John commented on HBASE-25445:


This issues arises when we have WAL split NOT managed by zk ?  In Zk managed 
this works?  Trying to understand. Sorry did not see patch

> SplitWALRemoteProcedure failed to archive split WAL
> ---
>
> Key: HBASE-25445
> URL: https://issues.apache.org/jira/browse/HBASE-25445
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.2.3, 2.4.1
>Reporter: mokai
>Assignee: Anjan Das
>Priority: Critical
> Attachments: ServerCrashWrongFSError.png
>
>
> If 'hbase.wal.dir' and 'hbase.rootdir' are configured to different 
> filesystem, SplitWALRemoteProcedure archived split WAL failed since 
> SplitWALManager using wrong fs instance. SplitWALManager should use WAL 
> corresponding fs instance.
> Steps to Reproduce:
>  * Configure 'hbase.wal.dir' and 'hbase.rootdir' so that they point to 
> different fs instances.
>  * Start HBase with multiple RS. 
>  * Create a couple of tables and some rows in them so that the RSs get 
> assigned with some regions. 
>  * Take any RS with non-zero number of regions offline. 
>  * Check master logs for "Wrong FS" error as shown in the screenshot 
> attached. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25441) add security check for some APIs in RSRpcServices

2021-01-04 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258054#comment-17258054
 ] 

Anoop Sam John commented on HBASE-25441:


Can add a Release notes?  API names and expected rights to do the op. All need 
Admin access level right?

> add security check for some APIs in RSRpcServices
> -
>
> Key: HBASE-25441
> URL: https://issues.apache.org/jira/browse/HBASE-25441
> Project: HBase
>  Issue Type: Bug
>Reporter: lujie
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.3.4, 2.5.0, 2.4.1
>
>
>  
> ||API||Severity||symptom||
> |clearRegionBlockCache|Severe|The API will call 
> LruBlockCache.evictBlocksByHfileName, 
>  who is declared as an expensive operation(see its comments), thus non-amin 
> may result Dos|
> |clearSlowLogsResponses|Normal|clears queue records from ringbuffer|
> |updateConfiguration|Normal|non-admin user can make RS reload configutation 
> from disk by this API. |
> |updateRegionFavoredNodesMapping|Normal|Non-admin user can change the 
> region's best storage location by this api|
> |stopServer|low|stopServer on RS is slient, which make client think he/she 
> success shutdown RS. 
>  Add preRpcCheck ont only make client receive the failed message, 
>  but also prevent the non-admin user stop the RS, 
>  even the hbase.coprocessor.regionserver.classes are not configured.|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25443) Improve the experience of using the Master webpage by change the loading process of snapshot list to asynchronous

2020-12-24 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254473#comment-17254473
 ] 

Anoop Sam John commented on HBASE-25443:


+1.. I was about to raise this improvement.
You will work on a PR? Thanks

> Improve the experience of using the Master webpage by change the loading 
> process of snapshot list to asynchronous
> -
>
> Key: HBASE-25443
> URL: https://issues.apache.org/jira/browse/HBASE-25443
> Project: HBase
>  Issue Type: Improvement
>  Components: master, UI
>Affects Versions: 3.0.0-alpha-1
>Reporter: Zhuoyue Huang
>Assignee: Zhuoyue Huang
>Priority: Minor
> Fix For: 3.0.0-alpha-1
>
> Attachments: image-2020-12-24-13-17-17-213.png
>
>
> Background: When there are many snapshots, loading the master webpage is very 
> slow, which affects the experience. (Our cluster has more than 3000  
> snapshots, and it takes about 10 seconds to load the master webpage each time)
>  
> 1. The snapshot list is not in the master memory, hdfs need to be scanned 
> when loading
> 2. Changing the process of loading  snapshots to asynchronous can improve the 
> experience
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25246) Backup/Restore hbase cell tags.

2020-12-16 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250416#comment-17250416
 ] 

Anoop Sam John commented on HBASE-25246:


IMO its a bug fix only and so should go to all active patch branches.

> Backup/Restore hbase cell tags.
> ---
>
> Key: HBASE-25246
> URL: https://issues.apache.org/jira/browse/HBASE-25246
> Project: HBase
>  Issue Type: Improvement
>  Components: backuprestore
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.1
>
>
> In PHOENIX-6213 we are planning to add cell tags for Delete mutations. After 
> having a discussion with hbase community via dev mailing thread, it was 
> decided that we will pass the tags via an attribute in Mutation object and 
> persist them to hbase via phoenix co-processor. The intention of PHOENIX-6213 
> is to store metadata in Delete marker so that while running Restore tool we 
> can selectively restore certain Delete markers and ignore others. For that to 
> happen we need to persist these tags in Backup and retrieve them in Restore 
> MR jobs (Import/Export tool). 
> Currently we don't persist the tags in Backup. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24850) CellComparator perf improvement

2020-12-14 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249445#comment-17249445
 ] 

Anoop Sam John commented on HBASE-24850:


[~ram_krish]  As per the perf test analysis what is causing the perf down for 
CellComparator path? is it too much branching and not getting inlined? Or is it 
the calls to decode int/short (much more than 1.x as KV methods not in Cell)?
Its not just Comparator knowing abt the contiguous key part. KV had methods 
exposing this stuff which is missing in Cell and its sub classes.  So bringing 
it back (not into Cell) another area.  Can we have compare(Cell) in 
ExtendedCell?  Sorry did not see the patch..  I saw Stack raised a concern over 
the branching and so asking. So in that case we need to see how to handle that 
area as well.. The perf test analysis will help here for decision making.

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25390) CopyTable and Coprocessor based export tool should backup and restore cell tags.

2020-12-14 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249010#comment-17249010
 ] 

Anoop Sam John commented on HBASE-25390:


All tools which work based on region level scan will have issue of not backing 
up cell tags.

> CopyTable and Coprocessor based export tool should backup and restore cell 
> tags.
> 
>
> Key: HBASE-25390
> URL: https://issues.apache.org/jira/browse/HBASE-25390
> Project: HBase
>  Issue Type: Improvement
>  Components: backuprestore
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
>
> In HBASE-25246 we added support for Mapreduce based Export/Import tool to 
> backup/restore cell tags. Mapreduce based export tool is not the only tool 
> that takes snapshot or backup of a given table.
> We also have Coprocessor based Export and CopyTable tools which takes backup 
> of a given table. We need to add support for the above 2 tools to save cell 
> tags to file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25032) Wait for region server to become online before adding it to online servers in Master

2020-12-11 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248303#comment-17248303
 ] 

Anoop Sam John commented on HBASE-25032:


What is the approach here?
bq.What else other than this repliction setup? Can you put all? We can think 
anything else which might be time consuming. That can really help to decide 
whether we should really need yet another step of informing HM from RS that its 
ready for taking up regions load.
Can u pls ans above Q?  

> Wait for region server to become online before adding it to online servers in 
> Master
> 
>
> Key: HBASE-25032
> URL: https://issues.apache.org/jira/browse/HBASE-25032
> Project: HBase
>  Issue Type: Bug
>Reporter: Sandeep Guggilam
>Assignee: Caroline
>Priority: Major
>
> As part of RS start up, RS reports for duty to Master . Master acknowledges 
> the request and adds it to the onlineServers list for further assigning any 
> regions to the RS
> Once Master acknowledges the reportForDuty and sends back the response, RS 
> does a bunch of stuff like initializing replication sources etc before 
> becoming online. However, sometimes there could be an issue with initializing 
> replication sources when it is unable to connect to peer clusters because of 
> some kerberos configuration and there would be a delay of around 20 mins in 
> becoming online.
>  
> Since master considers it online, it tries to assign regions and which fails 
> with ServerNotRunningYet exception, then the master tries to unassign which 
> again fails with the same exception leading the region to FAILED_CLOSE state.
>  
> It would be good to have a check to see if the RS is ready to accept the 
> assignment requests before adding it to online servers list which would 
> account for any such delays as described above



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25378) Legacy comparator in Hfile trailer will fail to load

2020-12-09 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246648#comment-17246648
 ] 

Anoop Sam John commented on HBASE-25378:


At first thought its a data compatibility break issue.
 
Seems this wont create a compatibility issue by not able to read old cluster 
generated HFiles.  Because when we write the comparator class name in FFT, we 
still use 1.x based comparator names
See FFT#toProtobuf ()
{code}
.setComparatorClassName(getHBase1CompatibleName(comparatorClassName))
..
private String getHBase1CompatibleName(final String comparator) {
if (comparator.equals(CellComparatorImpl.class.getName())) {
  return KeyValue.COMPARATOR.getClass().getName();
}
if (comparator.equals(MetaCellComparator.class.getName())) {
  return KeyValue.META_COMPARATOR.getClass().getName();
}
return comparator;
  }
{code}
Though u can confirm with functional tests once. 

> Legacy comparator in Hfile trailer will fail to load
> 
>
> Key: HBASE-25378
> URL: https://issues.apache.org/jira/browse/HBASE-25378
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.4.0, 2.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.4.0, 2.3.4
>
>
> HBASE-24968 moved MetaCellComparator out from CellComparatorImpl to avoid the 
> deadlock issue. But this introduced compatibility issue, old hfile with 
> comparator class as 
> "org.apache.hadoop.hbase.CellComparator$MetaCellComparator" will fail to open 
> due to ClassNotFoundException.
> Also we should also handle the case when comparator class is 
> "org.apache.hadoop.hbase.CellComparatorImpl$MetaCellComparator", which was 
> case before HBASE-24968.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25378) Legacy comparator in Hfile trailer will fail to load

2020-12-09 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-25378:
---
Fix Version/s: 2.3.4
   2.4.0

> Legacy comparator in Hfile trailer will fail to load
> 
>
> Key: HBASE-25378
> URL: https://issues.apache.org/jira/browse/HBASE-25378
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.4.0, 2.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.4.0, 2.3.4
>
>
> HBASE-24968 moved MetaCellComparator out from CellComparatorImpl to avoid the 
> deadlock issue. But this introduced incompatibility issue, old hfile with 
> comparator class as 
> "org.apache.hadoop.hbase.CellComparator$MetaCellComparator" will fail to open 
> due to ClassNotFoundException.
> Also we should also handle the case when comparator class is 
> "org.apache.hadoop.hbase.CellComparatorImpl$MetaCellComparator", which was 
> case before HBASE-24968.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory

2020-12-07 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245711#comment-17245711
 ] 

Anoop Sam John commented on HBASE-21874:


Ya this is not supported as per this Jira.  mind opening a new jira? You want 
provide a patch then?  We can refer FileIOEngine for multiple paths support.

> Bucket cache on Persistent memory
> -
>
> Key: HBASE-21874
> URL: https://issues.apache.org/jira/browse/HBASE-21874
> Project: HBase
>  Issue Type: New Feature
>  Components: BucketCache
>Affects Versions: 3.0.0-alpha-1
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0
>
> Attachments: HBASE-21874.branch-2.2.001.patch, HBASE-21874.patch, 
> HBASE-21874.patch, HBASE-21874_V2.patch, HBASE-21874_V4.patch, 
> HBASE-21874_V5.patch, HBASE-21874_V6.patch, Pmem_BC.png
>
>
> Non volatile persistent memory devices are byte addressable like DRAM (for 
> eg. Intel DCPMM). Bucket cache implementation can take advantage of this new 
> memory type and can make use of the existing offheap data structures to serve 
> data directly from this memory area without having to bring the data to 
> onheap.
> The patch is a new IOEngine implementation that works with the persistent 
> memory.
> Note : Here we don't make use of the persistence nature of the device and 
> just make use of the big memory it provides.
> Performance numbers to follow. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25032) Wait for region server to become online before adding it to online servers in Master

2020-12-03 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243744#comment-17243744
 ] 

Anoop Sam John commented on HBASE-25032:


{quote}
I spent some time looking at the code today. One thing I noticed is that we 
abort the RS by throwing exception in case of any issues with replication setup 
with the peer during the startup of RS : 
https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java#L241

So looks like the current design already treats some aspects of setting up the 
replication as important and aborts the RS if not setup properly as opposed to 
our thought of letting RS accept requests even if replication fails in an async 
thread
{quote}
If replication is enabled and we can not set it up in an RS instace, aborting 
that looks correct. Else the data in this RS will never get replicated.  Once 
the RS aborts, its WAL replication Q will get assigned to another healthy RS.  
So it should be very much ok we try this replication setup in an async thread.  
Ya it may some time and till then all writes will be in backlog and later it 
will get replicated.  If after attempt the replication setup fails (rare chance 
anyways right) it will abort RS then.
bq.Once Master acknowledges the reportForDuty and sends back the response, RS 
does a bunch of stuff like initializing replication sources etc before becoming 
online. 
What else other than this repliction setup?  Can you put all?  We can think 
anything else which might be time consuming.  That can really help to decide 
whether we should really need yet another step of informing HM from RS that its 
ready for taking up regions load.


> Wait for region server to become online before adding it to online servers in 
> Master
> 
>
> Key: HBASE-25032
> URL: https://issues.apache.org/jira/browse/HBASE-25032
> Project: HBase
>  Issue Type: Bug
>Reporter: Sandeep Guggilam
>Assignee: Caroline
>Priority: Major
>
> As part of RS start up, RS reports for duty to Master . Master acknowledges 
> the request and adds it to the onlineServers list for further assigning any 
> regions to the RS
> Once Master acknowledges the reportForDuty and sends back the response, RS 
> does a bunch of stuff like initializing replication sources etc before 
> becoming online. However, sometimes there could be an issue with initializing 
> replication sources when it is unable to connect to peer clusters because of 
> some kerberos configuration and there would be a delay of around 20 mins in 
> becoming online.
>  
> Since master considers it online, it tries to assign regions and which fails 
> with ServerNotRunningYet exception, then the master tries to unassign which 
> again fails with the same exception leading the region to FAILED_CLOSE state.
>  
> It would be good to have a check to see if the RS is ready to accept the 
> assignment requests before adding it to online servers list which would 
> account for any such delays as described above



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25026) Create a metric to track full region scans RPCs

2020-11-18 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-25026.

Hadoop Flags: Reviewed
  Resolution: Fixed

> Create a metric to track full region scans RPCs
> ---
>
> Key: HBASE-25026
> URL: https://issues.apache.org/jira/browse/HBASE-25026
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1
>Reporter: ramkrishna.s.vasudevan
>Assignee: Gaurav Kanade
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> A metric that indicates how many of the scan requests were without start row 
> and/or stop row. Generally such queries may be wrongly written or may require 
> better schema design and those may be some queries doing some sanity check to 
> verify if their actual application logic has done the necessary updates and 
> the all that expected rows are processed. 
> We do have some logs at the RPC layer to see what queries take time but 
> nothing as a metric. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25026) Create a metric to track full region scans RPCs

2020-11-18 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-25026:
---
Fix Version/s: 2.4.0

> Create a metric to track full region scans RPCs
> ---
>
> Key: HBASE-25026
> URL: https://issues.apache.org/jira/browse/HBASE-25026
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1
>Reporter: ramkrishna.s.vasudevan
>Assignee: Gaurav Kanade
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> A metric that indicates how many of the scan requests were without start row 
> and/or stop row. Generally such queries may be wrongly written or may require 
> better schema design and those may be some queries doing some sanity check to 
> verify if their actual application logic has done the necessary updates and 
> the all that expected rows are processed. 
> We do have some logs at the RPC layer to see what queries take time but 
> nothing as a metric. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25026) Create a metric to track full region scans RPCs

2020-11-16 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233235#comment-17233235
 ] 

Anoop Sam John commented on HBASE-25026:


Pushed to master.  [~gouravk]  Pls add release notes explaining the new metric 
and its meaning. Tks for the contribution.

> Create a metric to track full region scans RPCs
> ---
>
> Key: HBASE-25026
> URL: https://issues.apache.org/jira/browse/HBASE-25026
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1
>Reporter: ramkrishna.s.vasudevan
>Assignee: Gaurav Kanade
>Priority: Minor
> Fix For: 3.0.0-alpha-1
>
>
> A metric that indicates how many of the scan requests were without start row 
> and/or stop row. Generally such queries may be wrongly written or may require 
> better schema design and those may be some queries doing some sanity check to 
> verify if their actual application logic has done the necessary updates and 
> the all that expected rows are processed. 
> We do have some logs at the RPC layer to see what queries take time but 
> nothing as a metric. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25026) Create a metric to track full region scans RPCs

2020-11-16 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-25026:
---
Affects Version/s: (was: 2.4.0)

> Create a metric to track full region scans RPCs
> ---
>
> Key: HBASE-25026
> URL: https://issues.apache.org/jira/browse/HBASE-25026
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1
>Reporter: ramkrishna.s.vasudevan
>Assignee: Gaurav Kanade
>Priority: Minor
> Fix For: 3.0.0-alpha-1
>
>
> A metric that indicates how many of the scan requests were without start row 
> and/or stop row. Generally such queries may be wrongly written or may require 
> better schema design and those may be some queries doing some sanity check to 
> verify if their actual application logic has done the necessary updates and 
> the all that expected rows are processed. 
> We do have some logs at the RPC layer to see what queries take time but 
> nothing as a metric. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25026) Create a metric to track full region scans RPCs

2020-11-16 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-25026:
---
Fix Version/s: 3.0.0-alpha-1

> Create a metric to track full region scans RPCs
> ---
>
> Key: HBASE-25026
> URL: https://issues.apache.org/jira/browse/HBASE-25026
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: Gaurav Kanade
>Priority: Minor
> Fix For: 3.0.0-alpha-1
>
>
> A metric that indicates how many of the scan requests were without start row 
> and/or stop row. Generally such queries may be wrongly written or may require 
> better schema design and those may be some queries doing some sanity check to 
> verify if their actual application logic has done the necessary updates and 
> the all that expected rows are processed. 
> We do have some logs at the RPC layer to see what queries take time but 
> nothing as a metric. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25026) Create a metric to track full region scans RPCs

2020-11-16 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-25026:
---
Summary: Create a metric to track full region scans RPCs  (was: Create a 
metric to track scans that have no start row and/or stop row)

> Create a metric to track full region scans RPCs
> ---
>
> Key: HBASE-25026
> URL: https://issues.apache.org/jira/browse/HBASE-25026
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: Gaurav Kanade
>Priority: Minor
>
> A metric that indicates how many of the scan requests were without start row 
> and/or stop row. Generally such queries may be wrongly written or may require 
> better schema design and those may be some queries doing some sanity check to 
> verify if their actual application logic has done the necessary updates and 
> the all that expected rows are processed. 
> We do have some logs at the RPC layer to see what queries take time but 
> nothing as a metric. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25287) Forgetting to unbuffer streams results in many CLOSE_WAIT sockets when loading files

2020-11-16 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232621#comment-17232621
 ] 

Anoop Sam John commented on HBASE-25287:


Planning to give a fix PR?

> Forgetting to unbuffer streams results in many CLOSE_WAIT sockets when 
> loading files
> 
>
> Key: HBASE-25287
> URL: https://issues.apache.org/jira/browse/HBASE-25287
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Xiaolin Ha
>Priority: Major
> Attachments: 1605328358304-image.png, 1605328417888-image.png, 
> 1605504914256-image.png
>
>
> HBASE-9393 found seek+read will leave many CLOSE_WAIT sockets without stream 
> unbuffer, which can free sockets and file descriptors held by the stream. 
> In our cluster RSes with about one hundred thousand store files, we found the 
> number of  CLOSE_WAIT sockets increases with the number of regions opened, 
> and can up to the operating system open files limit 100.
>  
> {code:java}
> 2020-11-12 20:19:02,452 WARN  [1282990092@qtp-220038608-1 - Acceptor0 
> SelectChannelConnector@0.0.0.0:16030] mortbay.log: EXCEPTION
> java.io.IOException: Too many open files
>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>         at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
>         at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
>         at 
> org.mortbay.jetty.nio.SelectChannelConnector$1.acceptChannel(SelectChannelConnector.java:75)
>         at 
> org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:686)
>         at 
> org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:192)
>         at 
> org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
>         at 
> org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:708)
>         at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> {code}
>  
> {code:java}
> [hbase@gha-data-hbase-cat0053 hbase]$ ulimit -SHn
> 100
> {code}
>  
>  
> The reason of the problem is, when store file opened, 
> {code:java}
> private void open() throws IOException {
>   fileInfo.initHDFSBlocksDistribution();
>   long readahead = fileInfo.isNoReadahead() ? 0L : -1L;
>   ReaderContext context = fileInfo.createReaderContext(false, readahead, 
> ReaderType.PREAD);
>   fileInfo.initHFileInfo(context);
>   StoreFileReader reader = fileInfo.preStoreFileReaderOpen(context, 
> cacheConf);
>   if (reader == null) {
> reader = fileInfo.createReader(context, cacheConf);
> fileInfo.getHFileInfo().initMetaAndIndex(reader.getHFileReader());
>   }
> {code}
> only createReader() unbuffered the stream. In initMetaAndIndex(), using the 
> stream to read blocks, so it needs to unbuffer() the socket , too.
> We can just add try before fileInfo.initHFileInfo(context); and finally 
> unbuffer() the stream at the end of the open() function.
> We fixed it on our cluster, the number of CLOSE_WAIT reduced to about 0. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25260) upgrading hbase from 2.0.6 to 2.1.1, HMaster failed to become active because it cannot find hbase:namespace table

2020-11-15 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232548#comment-17232548
 ] 

Anoop Sam John commented on HBASE-25260:


What about the WAL system? Did u happen to delete/change the WAL FS between 
stop of 2.0.x cluster and start of new upgraded cluster?

> upgrading hbase from 2.0.6 to 2.1.1, HMaster failed to become active because 
> it cannot find hbase:namespace table
> -
>
> Key: HBASE-25260
> URL: https://issues.apache.org/jira/browse/HBASE-25260
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.1, 2.0.6
>Reporter: Yongle Zhang
>Priority: Major
> Attachments: hmaster.log
>
>
> When we upgraded HBASE cluster from 2.0.6 to 2.1.1, the HMaster on upgraded 
> node failed to start.
> Some stack trace in the error log:
> {code:java}
> 2020-11-06 02:01:26,420 WARN  [PEWorker-12] 
> assignment.RegionTransitionProcedure: Failed transition, suspend 1secs 
> pid=12, ppid=9, state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; 
> AssignProcedure table=TestTable, region=37d62d2c1934da269a592e0e5cbca82a; 
> rit=OFFLINE, location=null; waiting on rectified condition fixed by other 
> Procedure or operator intervention
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> TestTable
>   at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:215)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignProcedure.assign(AssignProcedure.java:194)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignProcedure.startTransition(AssignProcedure.java:205)
>   at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:355)
>   at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:97)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:957)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1835)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1595)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1200(ProcedureExecutor.java:80)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2140)
> {code}
> Seems it's caused by not able to find hbase:namespace table after upgrade: 
> {code:java}
> 2020-11-06 02:01:26,791 ERROR [master/399fd6ca0c6d:16000:becomeActiveMaster] 
> master.HMaster: Master server abort: loaded coprocessors are: []
> 2020-11-06 02:01:26,791 ERROR [master/399fd6ca0c6d:16000:becomeActiveMaster] 
> master.HMaster: * ABORTING master 399fd6ca0c6d,16000,1604628075265: 
> Unhandled exception. Starting shutdown. *
> java.lang.IllegalStateException: Expected the service 
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
>   at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345)
>   at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291)
>   at 
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1253)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1031)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2254)
>   at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:583)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.TableNotFoundException: hbase:namespace
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:864)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:759)
>   at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:745)
>   at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:716)
>   at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.getRegionLocation(ConnectionImplementation.java:594)
>   at 
> 

[jira] [Commented] (HBASE-25065) WAL archival to be done by a separate thread

2020-11-09 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228932#comment-17228932
 ] 

Anoop Sam John commented on HBASE-25065:


[~ram_krish] pls add RN to highlight the configs

> WAL archival to be done by a separate thread
> 
>
> Key: HBASE-25065
> URL: https://issues.apache.org/jira/browse/HBASE-25065
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Currently we do clean up of logs once we ensure that the region data has been 
> flushed. We track the sequence number and if we ensure that the seq number 
> has been flushed for any given region and the WAL that was rolled has that 
> seq number then those WAL can be archived.
> When we have around ~50 files to archive (per RS) - we do the archiving one 
> after the other. Since archiving is nothing but a rename operation it adds to 
> the meta operation load of Cloud based FS. 
> Not only that - the entire archival is done inside the rollWriterLock. Though 
> we have closed the writer and created a new writer and the writes are ongoing 
> - we never release the lock until we are done with the archiving. 
> What happens is that during that period our logs grow in size compared to the 
> default size configured (when we have consistent writes happening). 
> So the proposal is to move the log archival to a seperate thread and ensure 
> we can do some kind of throttling or batching so that we don't do archival at 
> one shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25251) Enable configuration based enable/disable of Unsafe package usage

2020-11-06 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227600#comment-17227600
 ] 

Anoop Sam John commented on HBASE-25251:


bq. I already committed a patch which does that
Oh missed it or forgot about it.
Thanks for the detailed explanation.. +1 for this improvement

> Enable configuration based enable/disable of Unsafe package usage
> -
>
> Key: HBASE-25251
> URL: https://issues.apache.org/jira/browse/HBASE-25251
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sandeep Guggilam
>Assignee: Sandeep Guggilam
>Priority: Major
>
> We need a provide away for clients to disable Unsafe package usage . 
> Currently there is no way for clients to specify that they don't want to use 
> Unsafe conversion for Bytes conversion.
> As a result there could be some issues with missing methods of Unsafe when 
> client is on JDK 11 . So the clients can disable Unsafe package use and use 
> normal conversion if they want to.
> Also we use static references to Unsafe Availability in Bytes class assuming 
> that the Unsafe availability is set during class loading and no one can ever 
> override it later. Now that we plan to expose a util for clients to override 
> the availability if required, we need to avoid the static references for 
> computing the availability whenever we do the comparisions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25251) Enable configuration based enable/disable of Unsafe package usage

2020-11-06 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227349#comment-17227349
 ] 

Anoop Sam John commented on HBASE-25251:


Can we consider tightening our Unsafe avail checker to check whether all the 
required methods are really available also?  It would be good if our logic can 
check whether to use Unsafe or not rather than asking customer to config it. 
Thoughts?

> Enable configuration based enable/disable of Unsafe package usage
> -
>
> Key: HBASE-25251
> URL: https://issues.apache.org/jira/browse/HBASE-25251
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sandeep Guggilam
>Assignee: Sandeep Guggilam
>Priority: Major
>
> We need a provide away for clients to disable Unsafe package usage . 
> Currently there is no way for clients to specify that they don't want to use 
> Unsafe conversion for Bytes conversion.
> As a result there could be some issues with missing methods of Unsafe when 
> client is on JDK 11 . So the clients can disable Unsafe package use and use 
> normal conversion if they want to.
> Also we use static references to Unsafe Availability in Bytes class assuming 
> that the Unsafe availability is set during class loading and no one can ever 
> override it later. Now that we plan to expose a util for clients to override 
> the availability if required, we need to avoid the static references for 
> computing the availability whenever we do the comparisions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25239) Upgrading HBase from 2.2.0/2.3.3 to master(3.0.0) fails because HMaster “Failed to become active master”

2020-11-06 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227348#comment-17227348
 ] 

Anoop Sam John commented on HBASE-25239:


Seems TableNamespaceManager.migrateNamespaceTable is not really waiting for the 
NS table to get online? Did not read  code.  

> Upgrading HBase from 2.2.0/2.3.3 to master(3.0.0) fails because HMaster 
> “Failed to become active master”
> 
>
> Key: HBASE-25239
> URL: https://issues.apache.org/jira/browse/HBASE-25239
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.3
>Reporter: Zhuqi Jin
>Priority: Major
>
> When we upgraded HBASE cluster from 2.2.0/2.3.3 to 
> master(c303f9d329d578d31140e507bdbcbe3aa097042b),  the HMaster on upgraded 
> node failed to start.
> The error message is shown below:
> {code:java}
> 2020-11-03 02:52:27,809 ERROR [master/65cddff041f6:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active 
> masterjava.lang.IllegalStateException: Expected the service 
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILEDat 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:379)at
>  
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:319)at
>  
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1362)at
>  
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1137)at
>  
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2245)at
>  org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:626)at 
> java.lang.Thread.run(Thread.java:748)Caused by: 
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2 
> actions: RetriesExhaustedException: 2 times, servers with issues:at 
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.makeError(BufferedMutatorOverAsyncBufferedMutator.java:107)at
>  
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.internalFlush(BufferedMutatorOverAsyncBufferedMutator.java:122)at
>  
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.close(BufferedMutatorOverAsyncBufferedMutator.java:166)at
>  
> org.apache.hadoop.hbase.master.TableNamespaceManager.migrateNamespaceTable(TableNamespaceManager.java:93)at
>  
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:123)at
>  
> org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:61)at
>  
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:249)at
>  
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1360)...
>  4 more2020-11-03 02:52:27,810 ERROR 
> [master/65cddff041f6:16000:becomeActiveMaster] master.HMaster: Master server 
> abort: loaded coprocessors are: []2020-11-03 02:52:27,810 ERROR 
> [master/65cddff041f6:16000:becomeActiveMaster] master.HMaster: * ABORTING 
> master 65cddff041f6,16000,1604371935915: Unhandled exception. Starting 
> shutdown. *java.lang.IllegalStateException: Expected the service 
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILEDat 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:379)at
>  
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:319)at
>  
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1362)at
>  
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1137)at
>  
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2245)at
>  org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:626)at 
> java.lang.Thread.run(Thread.java:748)Caused by: 
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2 
> actions: RetriesExhaustedException: 2 times, servers with issues:at 
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.makeError(BufferedMutatorOverAsyncBufferedMutator.java:107)at
>  
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.internalFlush(BufferedMutatorOverAsyncBufferedMutator.java:122)at
>  
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.close(BufferedMutatorOverAsyncBufferedMutator.java:166)at
>  
> org.apache.hadoop.hbase.master.TableNamespaceManager.migrateNamespaceTable(TableNamespaceManager.java:93)at
>  
> 

[jira] [Commented] (HBASE-25238) Upgrading HBase from 2.2.0 to 2.3.x fails because of “Message missing required fields: state”

2020-11-04 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226485#comment-17226485
 ] 

Anoop Sam John commented on HBASE-25238:


Actually upgrade from 2.0.x or 2.1.x to 2.2.0+ versions will have this issue.  
Here the test was from 2.2.0 RC0 right? In 2.2.0 release itself this breaking 
change went in.   Can change the jira title and desc?

> Upgrading HBase from 2.2.0 to 2.3.x fails because of “Message missing 
> required fields: state”
> -
>
> Key: HBASE-25238
> URL: https://issues.apache.org/jira/browse/HBASE-25238
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Zhuqi Jin
>Priority: Critical
>
> When we upgraded HBASE cluster from 2.0.0-RC0 to 2.3.0 or 2.3.3, the HMaster 
> on upgraded node failed to start.
> The error message is shown below: 
> {code:java}
> 2020-11-02 23:04:01,998 ERROR [master/2c4006997f99:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active 
> masterorg.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: state   at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:79)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:68)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:120)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:125)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:48)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:228)  
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:124)
>    at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.deserializeStateData(RegionRemoteProcedureBase.java:352)
>    at 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure.deserializeStateData(OpenRegionProcedure.java:72)
>    at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:294)
>    at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>    at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore$1.load(RegionProcedureStore.java:194)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore$2.load(WALProcedureStore.java:474)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.finish(ProcedureWALFormatReader.java:151)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:103)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:465)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.tryMigrate(RegionProcedureStore.java:184)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.recoverLease(RegionProcedureStore.java:257)
>    at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:587)
>    at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1572)
>    at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:950)
>    at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2240)
>    at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:622) 
>   at java.lang.Thread.run(Thread.java:748)2020-11-02 23:04:01,998 ERROR 
> [master/2c4006997f99:16000:becomeActiveMaster] master.HMaster: * ABORTING 
> master 2c4006997f99,16000,1604358237412: Unhandled exception. Starting 
> shutdown. 
> *org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: state   at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:79)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:68)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:120)
>    

[jira] [Commented] (HBASE-25238) Upgrading HBase from 2.2.0 to 2.3.x fails because of “Message missing required fields: state”

2020-11-03 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225302#comment-17225302
 ] 

Anoop Sam John commented on HBASE-25238:


HBASE-22074  added this required PB field.  This is fixed in 2.2.0
So we can say upgrade from 2.1.x to 2.2.0+ versions will have this issue (?)

> Upgrading HBase from 2.2.0 to 2.3.x fails because of “Message missing 
> required fields: state”
> -
>
> Key: HBASE-25238
> URL: https://issues.apache.org/jira/browse/HBASE-25238
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Zhuqi Jin
>Priority: Major
>
> When we upgraded HBASE cluster from 2.0.0-RC0 to 2.3.0 or 2.3.3, the HMaster 
> on upgraded node failed to start.
> The error message is shown below: 
> {code:java}
> 2020-11-02 23:04:01,998 ERROR [master/2c4006997f99:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active 
> masterorg.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: state   at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:79)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:68)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:120)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:125)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:48)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:228)  
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:124)
>    at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.deserializeStateData(RegionRemoteProcedureBase.java:352)
>    at 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure.deserializeStateData(OpenRegionProcedure.java:72)
>    at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:294)
>    at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>    at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore$1.load(RegionProcedureStore.java:194)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore$2.load(WALProcedureStore.java:474)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.finish(ProcedureWALFormatReader.java:151)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:103)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:465)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.tryMigrate(RegionProcedureStore.java:184)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.recoverLease(RegionProcedureStore.java:257)
>    at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:587)
>    at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1572)
>    at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:950)
>    at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2240)
>    at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:622) 
>   at java.lang.Thread.run(Thread.java:748)2020-11-02 23:04:01,998 ERROR 
> [master/2c4006997f99:16000:becomeActiveMaster] master.HMaster: * ABORTING 
> master 2c4006997f99,16000,1604358237412: Unhandled exception. Starting 
> shutdown. 
> *org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: state   at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:79)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:68)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:120)
>    at 
> 

[jira] [Commented] (HBASE-25224) Maximize sleep for checking meta and namespace regions availability

2020-10-31 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224095#comment-17224095
 ] 

Anoop Sam John commented on HBASE-25224:


Just updated the RN to denote the change.

> Maximize sleep for checking meta and namespace regions availability
> ---
>
> Key: HBASE-25224
> URL: https://issues.apache.org/jira/browse/HBASE-25224
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0, 2.2.7, 2.3.4
>
>
> The isRegionOnline method in HMaster is used on Master startup to check the 
> availability of hbase:meta and hbase:namespace tables.
> I've run into an issue when namespace was not online and Master was just 
> waiting there. I've used HBCK2 to fix the cluster but the initialization was 
> not completed because RetryCounterFactory was already waiting for 10+ hours.
> Since Master is waiting in an idle state it makes no harm to check the region 
> availability more frequently and limit the maximum sleep time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25224) Maximize sleep for checking meta and namespace regions availability

2020-10-31 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-25224:
---
Release Note: Changed the max sleep time during meta and namespace regions 
availability check to be 60 sec. Previously there was no such cap

> Maximize sleep for checking meta and namespace regions availability
> ---
>
> Key: HBASE-25224
> URL: https://issues.apache.org/jira/browse/HBASE-25224
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0, 2.2.7, 2.3.4
>
>
> The isRegionOnline method in HMaster is used on Master startup to check the 
> availability of hbase:meta and hbase:namespace tables.
> I've run into an issue when namespace was not online and Master was just 
> waiting there. I've used HBCK2 to fix the cluster but the initialization was 
> not completed because RetryCounterFactory was already waiting for 10+ hours.
> Since Master is waiting in an idle state it makes no harm to check the region 
> availability more frequently and limit the maximum sleep time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25229) Instantiate BucketCache before RS creates a their ephemeral node when rolling-upgrade

2020-10-30 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223449#comment-17223449
 ] 

Anoop Sam John commented on HBASE-25229:


pls give some details/desc

> Instantiate BucketCache before RS creates a their ephemeral node when 
> rolling-upgrade
> -
>
> Key: HBASE-25229
> URL: https://issues.apache.org/jira/browse/HBASE-25229
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.4.13
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25206) Data loss can happen if a cloned table loses original split region(delete table)

2020-10-22 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17219457#comment-17219457
 ] 

Anoop Sam John commented on HBASE-25206:


Tks.  [~brfrn169]
So even if snapshot cloned to new table or not, this case will cause data loss 
from snapshot. 

> Data loss can happen if a cloned table loses original split region(delete 
> table)
> 
>
> Key: HBASE-25206
> URL: https://issues.apache.org/jira/browse/HBASE-25206
> Project: HBase
>  Issue Type: Bug
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
>
> Steps to reproduce are as follows:
> 1. Create a table and put some data into the table:
> {code:java}
> create 'test1','cf'
> put 'test1','r1','cf','v1'
> put 'test1','r2','cf','v2'
> put 'test1','r3','cf','v3'
> put 'test1','r4','cf','v4'
> put 'test1','r5','cf','v5'
> {code}
> 2. Take a snapshot for the table:
> {code:java}
> snapshot 'test1','snap_test'
> {code}
> 3. Clone the snapshot to another table
> {code:java}
> clone_snapshot 'snap_test','test2'
> {code}
> 4. Split the original table
> {code:java}
> split 'test1','r3'
> {code}
> 5. Drop the original table
> {code:java}
> disable 'test1'
> drop 'test1'
> {code}
> After that, we see the error like the following in RS log when opening the 
> regions of the cloned table:
> {code:java}
> 2020-10-20 13:32:18,415 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
> Failed initialize of region= 
> test2,,1603200595702.bebdc4f740626206eeccad96b7643261., starting to roll back 
> memstore
> java.io.IOException: java.io.IOException: java.io.FileNotFoundException: 
> Unable to open link: org.apache.hadoop.hbase.io.HFileLink 
> locations=[hdfs:// HOST>:8020/hbase/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89,
>  hdfs:// HOST>:8020/hbase/.tmp/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89,
>  hdfs:// HOST>:8020/hbase/mobdir/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89,
>  hdfs:// HOST>:8020/hbase/archive/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89]
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1095)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:943)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:899)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7246)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7204)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7176)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7134)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7085)
> at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:283)
> at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: java.io.FileNotFoundException: Unable to open 
> link: org.apache.hadoop.hbase.io.HFileLink locations=[hdfs:// HOST>:8020/hbase/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89,
>  hdfs:// HOST>:8020/hbase/.tmp/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89,
>  hdfs:// HOST>:8020/hbase/mobdir/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89,
>  hdfs:// HOST>:8020/hbase/archive/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89]
> at 
> org.apache.hadoop.hbase.regionserver.HStore.openStoreFiles(HStore.java:590)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.loadStoreFiles(HStore.java:557)
> at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:303)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:5731)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1059)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1056)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> 

[jira] [Comment Edited] (HBASE-25206) Data loss can happen if a cloned table loses original split region(delete table)

2020-10-22 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218830#comment-17218830
 ] 

Anoop Sam John edited comment on HBASE-25206 at 10/23/20, 3:01 AM:
---

the data loss (file deleted instead of archived) will happen if we took 
snapshot and keep that only but deleted the original table right? Did not clone 
new table from snapshot initially but later at some point user may want to do 
so. Just confirming.
[~brfrn169]


was (Author: anoop.hbase):
the data loss (file deleted instead of archived) will happen if we took 
snapshot and keep that only but deleted the original table right? Did not clone 
new table from snapshot initially but later at some point user may want to do 
so. Just confirming.

> Data loss can happen if a cloned table loses original split region(delete 
> table)
> 
>
> Key: HBASE-25206
> URL: https://issues.apache.org/jira/browse/HBASE-25206
> Project: HBase
>  Issue Type: Bug
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
>
> Steps to reproduce are as follows:
> 1. Create a table and put some data into the table:
> {code:java}
> create 'test1','cf'
> put 'test1','r1','cf','v1'
> put 'test1','r2','cf','v2'
> put 'test1','r3','cf','v3'
> put 'test1','r4','cf','v4'
> put 'test1','r5','cf','v5'
> {code}
> 2. Take a snapshot for the table:
> {code:java}
> snapshot 'test1','snap_test'
> {code}
> 3. Clone the snapshot to another table
> {code:java}
> clone_snapshot 'snap_test','test2'
> {code}
> 4. Delete the snapshot
> {code:java}
> delete_snapshot 'snap_test'
> {code}
> 5. Split the original table
> {code:java}
> split 'test1','r3'
> {code}
> 6. Drop the original table
> {code:java}
> disable 'test1'
> drop 'test1'
> {code}
> After that, we see the error like the following in RS log when opening the 
> regions of the cloned table:
> {code:java}
> 2020-10-20 13:32:18,415 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
> Failed initialize of region= 
> test2,,1603200595702.bebdc4f740626206eeccad96b7643261., starting to roll back 
> memstore
> java.io.IOException: java.io.IOException: java.io.FileNotFoundException: 
> Unable to open link: org.apache.hadoop.hbase.io.HFileLink 
> locations=[hdfs:// HOST>:8020/hbase/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89,
>  hdfs:// HOST>:8020/hbase/.tmp/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89,
>  hdfs:// HOST>:8020/hbase/mobdir/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89,
>  hdfs:// HOST>:8020/hbase/archive/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89]
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1095)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:943)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:899)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7246)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7204)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7176)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7134)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7085)
> at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:283)
> at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: java.io.FileNotFoundException: Unable to open 
> link: org.apache.hadoop.hbase.io.HFileLink locations=[hdfs:// HOST>:8020/hbase/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89,
>  hdfs:// HOST>:8020/hbase/.tmp/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89,
>  hdfs:// HOST>:8020/hbase/mobdir/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89,
>  hdfs:// HOST>:8020/hbase/archive/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89]
> at 
> org.apache.hadoop.hbase.regionserver.HStore.openStoreFiles(HStore.java:590)
>  

[jira] [Commented] (HBASE-25205) Corrupted hfiles append timestamp every time the region is trying to open

2020-10-22 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218906#comment-17218906
 ] 

Anoop Sam John commented on HBASE-25205:


Its clear now. tks.
But when skip error config is false, the region open will always fail.. Thats a 
bigger concern.. I think I raised this issue somewhere else also.(where we 
discussed abt making split to hfile as default true).   The RS which doing the 
WAL split getting down may be common case.
In case of split to recovered edits (old way), what will happen with the 
recovered edits file which is partial?  When the splitting RS dies, another RS 
picks up this WAL file split task.  Then it will cleanup the prev split attempt 
and delete those partial files? Am not sure

> Corrupted hfiles append timestamp every time the region is trying to open
> -
>
> Key: HBASE-25205
> URL: https://issues.apache.org/jira/browse/HBASE-25205
> Project: HBase
>  Issue Type: Bug
>Reporter: Junhong Xu
>Assignee: Junhong Xu
>Priority: Major
>
> When the RS crashed, we replay WALs to generate recover edits or HFile 
> directly. If the replaying WAL RS crashed again, the file just writing to may 
> be corrupted. In some cases, we may want to move on(e.g. in the case of sink 
> to hfile as we have WAL and replaying the WAL again is OK), and move the file 
> with extra timestamp as suffix.But if the region is opened again, the 
> corrupted file can't be opened, and renamed with an extra timestamp 
> again.After some round like this, the file name will be too long to 
> rename.The log is like this:
> {code:java}
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$PathComponentTooLongException):
>  The maximum path component name limit of 6537855
> 8b0444c27a9d21fb0f4e4293f.1602831270772.1602831291050.1602831296855.1602831408803.1602831493989.1602831584077.1602831600838.1602831659805.1602831736374.1602831738002.1
> 602831959867.1602831979707.1602832095288.1602832103908.1602832538224.1602833079431
>  in directory /hbase/XXX/data/default/IntegrationTestBigLinkedList/aa376ec
> f026a5e63d0703384e34ec6aa/meta/recovered.hfiles is exceeded: limit=255 
> length=256
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxComponentLength(FSDirectory.java:1230)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.verifyFsLimitsForRename(FSDirRenameOp.java:98)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.unprotectedRenameTo(FSDirRenameOp.java:191)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameTo(FSDirRenameOp.java:493)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameToInt(FSDirRenameOp.java:62)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3080)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:1113)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:665)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:916)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1716)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2742)
> at org.apache.hadoop.ipc.Client.call(Client.java:1504)
> at org.apache.hadoop.ipc.Client.call(Client.java:1435)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy17.rename(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.rename(ClientNamenodeProtocolTranslatorPB.java:504)
> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:249)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:107)

[jira] [Commented] (HBASE-25205) Corrupted hfiles append timestamp every time the region is trying to open

2020-10-22 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218847#comment-17218847
 ] 

Anoop Sam John commented on HBASE-25205:


bq. If the replaying WAL RS crashed
u mean the RS which is doing WAL split?
bq. the file just writing to may be corrupted
u mean in case of split wal to HFile directly? in case of split and create 
recovered edits file, we will throw away the old incomplete file and create new 
right?
Can u pls explain more in which case we have an issue..  Sorry am not able to 
follow fully

> Corrupted hfiles append timestamp every time the region is trying to open
> -
>
> Key: HBASE-25205
> URL: https://issues.apache.org/jira/browse/HBASE-25205
> Project: HBase
>  Issue Type: Bug
>Reporter: Junhong Xu
>Assignee: Junhong Xu
>Priority: Major
>
> When the RS crashed, we replay WALs to generate recover edits or HFile 
> directly. If the replaying WAL RS crashed again, the file just writing to may 
> be corrupted. In some cases, we may want to move on(e.g. in the case of sink 
> to hfile as we have WAL and replaying the WAL again is OK), and move the file 
> with extra timestamp as suffix.But if the region is opened again, the 
> corrupted file can't be opened, and renamed with an extra timestamp 
> again.After some round like this, the file name will be too long to 
> rename.The log is like this:
> {code:java}
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$PathComponentTooLongException):
>  The maximum path component name limit of 6537855
> 8b0444c27a9d21fb0f4e4293f.1602831270772.1602831291050.1602831296855.1602831408803.1602831493989.1602831584077.1602831600838.1602831659805.1602831736374.1602831738002.1
> 602831959867.1602831979707.1602832095288.1602832103908.1602832538224.1602833079431
>  in directory /hbase/XXX/data/default/IntegrationTestBigLinkedList/aa376ec
> f026a5e63d0703384e34ec6aa/meta/recovered.hfiles is exceeded: limit=255 
> length=256
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxComponentLength(FSDirectory.java:1230)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.verifyFsLimitsForRename(FSDirRenameOp.java:98)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.unprotectedRenameTo(FSDirRenameOp.java:191)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameTo(FSDirRenameOp.java:493)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameToInt(FSDirRenameOp.java:62)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3080)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:1113)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:665)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:916)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1716)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2742)
> at org.apache.hadoop.ipc.Client.call(Client.java:1504)
> at org.apache.hadoop.ipc.Client.call(Client.java:1435)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy17.rename(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.rename(ClientNamenodeProtocolTranslatorPB.java:504)
> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:249)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:107)
> at com.sun.proxy.$Proxy18.rename(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
> at 
> 

[jira] [Commented] (HBASE-25206) Data loss can happen if a cloned table loses original split region(delete table)

2020-10-22 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218830#comment-17218830
 ] 

Anoop Sam John commented on HBASE-25206:


the data loss (file deleted instead of archived) will happen if we took 
snapshot and keep that only but deleted the original table right? Did not clone 
new table from snapshot initially but later at some point user may want to do 
so. Just confirming.

> Data loss can happen if a cloned table loses original split region(delete 
> table)
> 
>
> Key: HBASE-25206
> URL: https://issues.apache.org/jira/browse/HBASE-25206
> Project: HBase
>  Issue Type: Bug
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
>
> Steps to reproduce are as follows:
> 1. Create a table and put some data into the table:
> {code:java}
> create 'test1','cf'
> put 'test1','r1','cf','v1'
> put 'test1','r2','cf','v2'
> put 'test1','r3','cf','v3'
> put 'test1','r4','cf','v4'
> put 'test1','r5','cf','v5'
> {code}
> 2. Take a snapshot for the table:
> {code:java}
> snapshot 'test1','snap_test'
> {code}
> 3. Clone the snapshot to another table
> {code:java}
> clone_snapshot 'snap_test','test2'
> {code}
> 4. Delete the snapshot
> {code:java}
> delete_snapshot 'snap_test'
> {code}
> 5. Split the original table
> {code:java}
> split 'test1','r3'
> {code}
> 6. Drop the original table
> {code:java}
> disable 'test1'
> drop 'test1'
> {code}
> After that, we see the error like the following in RS log when opening the 
> regions of the cloned table:
> {code:java}
> 2020-10-20 13:32:18,415 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
> Failed initialize of region= 
> test2,,1603200595702.bebdc4f740626206eeccad96b7643261., starting to roll back 
> memstore
> java.io.IOException: java.io.IOException: java.io.FileNotFoundException: 
> Unable to open link: org.apache.hadoop.hbase.io.HFileLink 
> locations=[hdfs:// HOST>:8020/hbase/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89,
>  hdfs:// HOST>:8020/hbase/.tmp/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89,
>  hdfs:// HOST>:8020/hbase/mobdir/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89,
>  hdfs:// HOST>:8020/hbase/archive/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89]
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1095)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:943)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:899)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7246)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7204)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7176)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7134)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7085)
> at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:283)
> at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: java.io.FileNotFoundException: Unable to open 
> link: org.apache.hadoop.hbase.io.HFileLink locations=[hdfs:// HOST>:8020/hbase/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89,
>  hdfs:// HOST>:8020/hbase/.tmp/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89,
>  hdfs:// HOST>:8020/hbase/mobdir/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89,
>  hdfs:// HOST>:8020/hbase/archive/data/default/test1/349b766b1b38e21f627ed4e441ae643c/cf/b6e39865710345c8998dec0bcc94cc89]
> at 
> org.apache.hadoop.hbase.regionserver.HStore.openStoreFiles(HStore.java:590)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.loadStoreFiles(HStore.java:557)
> at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:303)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:5731)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1059)
>  

[jira] [Commented] (HBASE-25211) Rack awareness in region_mover

2020-10-21 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218356#comment-17218356
 ] 

Anoop Sam John commented on HBASE-25211:


We committed a jira which allows passing exclude hosts and include hosts while 
doing regionmover..  So using exclude hosts you can specify all hosts in that 
rack?

> Rack awareness in region_mover
> --
>
> Key: HBASE-25211
> URL: https://issues.apache.org/jira/browse/HBASE-25211
> Project: HBase
>  Issue Type: Improvement
>Reporter: Viraj Jasani
>Priority: Major
>
> region_mover should provide an option to ensure while unloading all regions, 
> all destination servers are selected from different racks and not the one 
> where server (where region_mover unload is getting executed) belongs to. This 
> might be helpful option if we want to avail rack downtime (or rack upgrade) 
> by stopping all Regionservers that belong to same rack for few hours. Without 
> this option, we don't have any control over which destination server is 
> selected and hence, some regions might keep bouncing from server A to B in 
> the same rack all the way until they are finally moved to RS that belongs to 
> separate rack.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23296) Add CompositeBucketCache to support tiered BC

2020-10-19 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216675#comment-17216675
 ] 

Anoop Sam John commented on HBASE-23296:


[~javaman_chen]  Still interested in this contribution I believe. What you say 
abt above comment/suggestion? 

> Add CompositeBucketCache to support tiered BC
> -
>
> Key: HBASE-23296
> URL: https://issues.apache.org/jira/browse/HBASE-23296
> Project: HBase
>  Issue Type: New Feature
>  Components: BlockCache
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
>
> LruBlockCache is not suitable in the following scenarios:
> (1) cache size too large (will take too much heap memory, and 
> evictBlocksByHfileName is not so efficient, as HBASE-23277 mentioned)
> (2) block evicted frequently, especially cacheOnWrite & prefetchOnOpen are 
> enabled.
> Since block‘s data is reclaimed by GC, this may affect GC performance.
> So how about enabling a Bucket based L1 Cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25191) JVMMetrics tag.processName regression between hbase-1.3 and hbase-2.x+ versions

2020-10-15 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-25191:
---
Affects Version/s: (was: 2.0.1)
   2.0.0

> JVMMetrics tag.processName regression between hbase-1.3 and hbase-2.x+ 
> versions
> ---
>
> Key: HBASE-25191
> URL: https://issues.apache.org/jira/browse/HBASE-25191
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
>
> The regression is caused by 
> https://issues.apache.org/jira/browse/HBASE-15160.
> In order to monitor the FS latencies and pread latencies we have added the 
> MetricsIO as part of metrics. Since we account this at the HFileBlock layer, 
> we have created a static MetricIO variable at HFile.java so that we can use 
> that metrics in a static way.
> Internally the MEtricIO creates a MetricsIOWrapperImpl that in turns 
> registers the metrics with the BaseSource. The flow is as follows,
> {code}
> this(CompatibilitySingletonFactory.getInstance(MetricsRegionServerSourceFactory.class)
> .createIO(wrapper), wrapper);
> {code}
> The createIO -> inturn creates a MetricsIOSourceImpl where the Metrics_Name 
> is 'IO'.
> The BaseSourceImpl registers a singleton JVMMetrics
> {code}
> synchronized void init(String name) {
>   ...
>   DefaultMetricsSystem.initialize(HBASE_METRICS_SYSTEM_NAME);
>   JvmMetrics.initSingleton(name, "");
>
> }
> {code}
> The name passed here is 'IO'.  This is where the processName gets set with 
> 'IO'.
> All other metrics that we create in the HRS and master is not static level 
> metrics whereas all are instance level metrics. So the very first time we 
> create either Master metrics or Region server metrics so then the metrics 
> would have had the processName as either RegionServer or master.
> But pls note am note very sure on this now like if at all are we creating a  
> metric based on the actual process name like it was during the hbase-1.x 
> time. In other words my doubt is even if we solve this 'IO' process case do 
> we really get back the processName as 'Master' or 'RegionServer' as in 
> https://issues.apache.org/jira/browse/HBASE-12328. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25186) TestMasterRegionOnTwoFileSystems is failing after HBASE-25065

2020-10-14 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213848#comment-17213848
 ] 

Anoop Sam John commented on HBASE-25186:


Introduction of afterRoll() was for MasterRegion correct?  May be what we 
wanted is afterArchive() which will be called after archiving 1+ files?

> TestMasterRegionOnTwoFileSystems is failing after HBASE-25065
> -
>
> Key: HBASE-25186
> URL: https://issues.apache.org/jira/browse/HBASE-25186
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: Duo Zhang
>Priority: Blocker
>
> After HBASE-25065, we are having a test case failure with 
> TestMasterRegionOnTwoFileSystems. 
> The reason is that we manually trigger a WAL roll on the master region. As 
> part of the WAL roll we expect the Master region's WAL will also be moved 
> from region oldWAL dir to the global oldWAL directory. This happens after 
> afterRoll() method in AbstractWALRoller. 
> Since  now the WAL archival is asynchronous the afterRoll() method does not 
> find any WAL file to be moved in the local region oldWAL dir. So the movement 
> to global oldWAL dir does not happen. 
> The test case checks for the file in the oldWAL dir and since it is not found 
> the test timesout. WE need a way to fix this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25142) Auto-fix 'Unknown Server'

2020-10-01 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205978#comment-17205978
 ] 

Anoop Sam John commented on HBASE-25142:


cc [~taklo...@gmail.com], [~zyork]

> Auto-fix 'Unknown Server'
> -
>
> Key: HBASE-25142
> URL: https://issues.apache.org/jira/browse/HBASE-25142
> Project: HBase
>  Issue Type: Improvement
>Reporter: Michael Stack
>Priority: Major
>
> Addressing reports of 'Unknown Server' has come up in various conversations 
> lately. This issue is about fixing instances of 'Unknown Server' 
> automatically as part of the tasks undertaken by CatalogJanitor when it runs.
> First though, would like to figure a definition for 'Unknown Server' and a 
> list of ways in which they arise. We need this to figure how to do safe 
> auto-fixing.
> Currently an 'Unknown Server' is a server found in hbase:meta that is not 
> online (no recent heartbeat) and that is not mentioned in the dead servers 
> list.
> In outline, I'd think CatalogJanitor could schedule an expiration of the RS 
> znode in zk (if exists) and then an SCP if it finds an 'Unknown Server'. 
> Perhaps it waits for 2x or 10x the heartbeat interval just-in-case (or not). 
> The SCP would clean up any references in hbase:meta by reassigning Regions 
> assigned the 'Unknown Server' after replaying any WALs found in hdfs 
> attributed to the dead server.
> As to how they arise:
>  * A contrived illustration would be a large online cluster crashes down with 
> a massive backlog of WAL files – zk went down for some reason say. The replay 
> of the WALs look like it could take a very long time  (lets say the cluster 
> was badly configured and a bug and misconfig made it so each RS was carrying 
> hundreds of WALs and there are hundreds of servers). To get the service back 
> online, the procedure store and WALs are moved aside (for later replay with 
> WALPlayer). The cluster comes up. meta is onlined but refers to server 
> instances that are no longer around. Can schedule an SCP per server mentioned 
> in the 'HBCK Report' by scraping and scripting hbck2 or, better, 
> catalogjanitor could just do it.
>  * HBASE-24286 HMaster won't become healthy after after cloning... describes 
> starting a cluster over data that is hfile-content only. In this case the 
> original servers used manufacture the hfile cluster data are long dead yet 
> meta still refers to the old servers. They will not make the 'dead servers' 
> list.
> Let this issue stew awhile. Meantime collect how 'Unknown Server' gets 
> created and best way to fix.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25065) WAL archival can be batched/throttled and also done by a separate thread

2020-09-20 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199162#comment-17199162
 ] 

Anoop Sam John commented on HBASE-25065:


+1 to do rename in another thread. For cloud FS rename is not just a meta op.

> WAL archival can be batched/throttled and also done by a separate thread
> 
>
> Key: HBASE-25065
> URL: https://issues.apache.org/jira/browse/HBASE-25065
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
>
> Currently we do clean up of logs once we ensure that the region data has been 
> flushed. We track the sequence number and if we ensure that the seq number 
> has been flushed for any given region and the WAL that was rolled has that 
> seq number then those WAL can be archived.
> When we have around ~50 files to archive (per RS) - we do the archiving one 
> after the other. Since archiving is nothing but a rename operation it adds to 
> the meta operation load of Cloud based FS. 
> Not only that - the entire archival is done inside the rollWriterLock. Though 
> we have closed the writer and created a new writer and the writes are ongoing 
> - we never release the lock until we are done with the archiving. 
> What happens is that during that period our logs grow in size compared to the 
> default size configured (when we have consistent writes happening). 
> So the proposal is to move the log archival to a seperate thread and ensure 
> we can do some kind of throttling or batching so that we don't do archival at 
> one shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25052) FastLongHistogram#getCountAtOrBelow method is broken.

2020-09-17 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198089#comment-17198089
 ] 

Anoop Sam John commented on HBASE-25052:


cc [~vjasani]

> FastLongHistogram#getCountAtOrBelow method is broken.
> -
>
> Key: HBASE-25052
> URL: https://issues.apache.org/jira/browse/HBASE-25052
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 1.6.0, 2.2.3
>Reporter: Rushabh Shah
>Priority: Major
>
> FastLongHistogram#getCountAtOrBelow method is broken.
> If I revert HBASE-23245 then it works fine.
> Wrote a small test case in TestHistogramImpl.java : 
> {code:java}
>   @Test
>   public void testAdd1() {
> HistogramImpl histogram = new HistogramImpl();
> for (int i = 0; i < 100; i++) {
>   histogram.update(i);
> }
> Snapshot snapshot = histogram.snapshot();
> // This should return count as 6 since we added 0, 1, 2, 3, 4, 5
> Assert.assertEquals(6, snapshot.getCountAtOrBelow(5));
> {code}
> It fails as below:
> java.lang.AssertionError: 
> Expected :6
> Actual  :100



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25032) Wait for region server to become online before adding it to online servers in Master

2020-09-17 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198085#comment-17198085
 ] 

Anoop Sam John commented on HBASE-25032:


The RS in src cluster should keep idle unless able to connect to peer?  Or 
rather init of replication in an async path in RS init?  This might cause 
larger replication backlog but that may be better than keeping it idle for 
mins.   Replication is not its primary duty right? Throwing Qs for a Brain 
storming...

> Wait for region server to become online before adding it to online servers in 
> Master
> 
>
> Key: HBASE-25032
> URL: https://issues.apache.org/jira/browse/HBASE-25032
> Project: HBase
>  Issue Type: Bug
>Reporter: Sandeep Guggilam
>Assignee: Sandeep Guggilam
>Priority: Major
>
> As part of RS start up, RS reports for duty to Master . Master acknowledges 
> the request and adds it to the onlineServers list for further assigning any 
> regions to the RS
> Once Master acknowledges the reportForDuty and sends back the response, RS 
> does a bunch of stuff like initializing replication sources etc before 
> becoming online. However, sometimes there could be an issue with initializing 
> replication sources when it is unable to connect to peer clusters because of 
> some kerberos configuration and there would be a delay of around 20 mins in 
> becoming online.
>  
> Since master considers it online, it tries to assign regions and which fails 
> with ServerNotRunningYet exception, then the master tries to unassign which 
> again fails with the same exception leading the region to FAILED_CLOSE state.
>  
> It would be good to have a check to see if the RS is ready to accept the 
> assignment requests before adding it to online servers list which would 
> account for any such delays as described above



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24919) A tool to rewrite corrupted HFiles

2020-08-21 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-24919.

Resolution: Duplicate

Dup of HBASE-24920

> A tool to rewrite corrupted HFiles
> --
>
> Key: HBASE-24919
> URL: https://issues.apache.org/jira/browse/HBASE-24919
> Project: HBase
>  Issue Type: Brainstorming
>  Components: hbase-operator-tools
>Reporter: Andrey Elenskiy
>Priority: Major
>
> Typically I have been dealing with corrupted HFiles (due to loss of hdfs 
> blocks) by just removing them. However, It always seemed wasteful to throw 
> away the entire HFile (which can be hundreds of gigabytes), just because one 
> hdfs block is missing (128MB).
> I think there's a possibility for a tool that can rewrite an HFile by 
> skipping corrupted blocks. 
> There can be multiple types of issues with hdfs blocks but any of them can be 
> treated as if the block doesn't exist:
> 1. All the replicas can be lost
> 2. The block can be corrupted due to some bug in hdfs (I've recently run into 
> HDFS-15186 by experimenting with EC).
> At the simplest the tool can be a local mapreduce job (mapper only) with a 
> custom HFile reader input that can seek to next DATABLK to skip corrupted 
> hdfs blocks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24282) 'scandetail' log message is missing when responseTooSlow happens on the first scan rpc call

2020-08-19 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180955#comment-17180955
 ] 

Anoop Sam John commented on HBASE-24282:


[~songxincun]   can  u pls put a link to the PR which fixed this bug.. 

> 'scandetail' log message is missing when responseTooSlow happens on the first 
> scan rpc call
> ---
>
> Key: HBASE-24282
> URL: https://issues.apache.org/jira/browse/HBASE-24282
> Project: HBase
>  Issue Type: Bug
>  Components: Operability
>Reporter: song XinCun
>Assignee: song XinCun
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 1.7.0, 2.2.5
>
>
> When responseTooSlow happens, the 'scandetail' message is printed to the warn 
> log. But when the call is the first scan rpc call, this message is missing. 
> This is because we get the 'scandetail' message with scannerId, but the first 
> scan rpc call doesn't have scannerId yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2020-08-18 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17179548#comment-17179548
 ] 

Anoop Sam John commented on HBASE-23035:


Ya I have seen this retainAssign stuff in 1.x based clusters.  What I mean is 
in some parts of AM,  we have configs which takes whether locality is to be 
considered and so calc based on that.. So ya a locality sensitive cluster can 
have such a config (new may be) turned ON.

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24289) Heterogeneous Storage for Date Tiered Compaction

2020-08-14 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177761#comment-17177761
 ] 

Anoop Sam John commented on HBASE-24289:


[~pengmq1],  Can you pls extend the Release notes to include all configs 
associated with this feature.  What are different configs and how to tune 
default value if any.  Now it covers only one config which is abt 
enabling/disabling it.  Tks

> Heterogeneous Storage for Date Tiered Compaction
> 
>
> Key: HBASE-24289
> URL: https://issues.apache.org/jira/browse/HBASE-24289
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Mengqing Peng
>Assignee: Mengqing Peng
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Support DateTiredCompaction(HBASE-15181) for cold and hot data separation, 
> support different storage policies for different time periods of data to get 
> better performance, for example, we can configure the data of last 1 month in 
> SSD, and 1 month ago data was in HDD.
> design doc: 
> https://docs.google.com/document/d/1fk_EWLNnxniwt3gDjUS_apQ3cPzn90AmvDT1wkirvKE/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2020-08-13 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176848#comment-17176848
 ] 

Anoop Sam John commented on HBASE-23035:


[~Bo Cui] So ur usecase is entire cluster restart and as part of that u want 
the regions to come back to the old RSs itself (as much as possible).  So 
locality can be preserved.
There is a some config around the LB which takes whether to consider the data 
locality aspect in deciding the plan.  Can we make use of the same thing .. May 
be not.. I dont remember details on that conf. [~zghao] 

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2020-08-11 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175277#comment-17175277
 ] 

Anoop Sam John commented on HBASE-23035:


[~zghao].. So how is the fix now?  The SSH will round robin the assignment even 
if the down RS came back by the time the AM start its work? Or are there some 
configs to control this? Or any other way?  Sorry did not see the patch

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24850) CellComparator perf improvement

2020-08-10 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175247#comment-17175247
 ] 

Anoop Sam John commented on HBASE-24850:


{code}
if (a instanceof ByteBufferKeyValue && b instanceof ByteBufferKeyValue) {
  diff = BBKVComparator.compare((ByteBufferKeyValue)a, 
(ByteBufferKeyValue)b, ignoreSequenceid);
  if (diff != 0) {
return diff;
  }
} else {
  diff = compareRows(a, b);
  if (diff != 0) {
return diff;
  }

  diff = compareWithoutRow(a, b);
  if (diff != 0) {
return diff;
  }
}
{code}
If the Cells are BBKV, we have a BBKVComparator path which is having 
optimizations and avoid decoding the offset/length many times.  But when normal 
read path where we get KVs we follow the else block and end up doing the 
offset/length decoding many a times.  In HBASE-24754, that is what Ram's test 
revealed. 
At 1st we can try optimizing this else code path and avoid these decoding again 
and again but instead reuse already decoded values.   Then we can see 
whether/how we can make a CellComparator type for Cells backed by contiguous 
data structure (byte[]/BB) 

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 2.4.0
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24850) CellComparator perf improvement

2020-08-10 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-24850:
---
Fix Version/s: (was: 2.0.0)
   2.4.0

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Priority: Critical
> Fix For: 2.4.0
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24850) CellComparator perf improvement

2020-08-10 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-24850:
---
Affects Version/s: (was: 2.4.0)
   2.0.0

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Priority: Critical
> Fix For: 2.0.0
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24754) Bulk load performance is degraded in HBase 2

2020-08-10 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175177#comment-17175177
 ] 

Anoop Sam John commented on HBASE-24754:


HBASE-24850 will track generic CellComparator level improvement

> Bulk load performance is degraded in HBase 2 
> -
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 2.2.3
>Reporter: Ajeet Rai
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Attachments: Branc2_withComparator_atKeyValue.patch, 
> Branch1.3_putSortReducer_sampleCode.patch, 
> Branch2_putSortReducer_sampleCode.patch, flamegraph_branch-1_new.svg, 
> flamegraph_branch-2.svg, flamegraph_branch-2_afterpatch.svg
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 186000120150205100068110,1860001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,1
> 3: Cluster: 7 node(2 master+5 Region Server)
>  4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both 
> cluster
>  
> |Feature|HBase 2.2.3
>  Time(Sec)|HBase 1.3.1
>  Time(Sec)|Diff%|Snappy lib:
>   |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
>  HBase 2.2.3: 1.4
>  HBase 1.3.1: 1.4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24850) CellComparator perf improvement

2020-08-10 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175176#comment-17175176
 ] 

Anoop Sam John commented on HBASE-24850:


We should optimize it at the CellComparatorImpl level itself so that all flows 
can take adv. This can be an issue in the overall perf issue which deal with so 
many Cells and compares. (The other 2.x perf issue of filtering cells in a 
range scan - HBASE-24637 )
In the initial time of CellComparatorImpl , there were some optimizations and 
so many overloaded compareXXX methods which takes not just Cells but few 
offsets/lengths also.. I think eventually got cleaned up. But such cleanup 
affect perf very much is what we seeing now.
In case of KeyValue the biggest adv is that we know it is a single contiguous 
datastructure backed object and so have ways to parse offset/length with out 
doing back to back decoding of other lengths every time. In a generic Cell and 
CellComparator such assumptions are not possible. But normally in HBase most of 
the time, the Cells flowing will be KV or BBKV both backed by contiguous 
datastructure .. We can think of having a new interface to mark such Cells and 
a CellComparator impl to take adv of that. This needs a bigger effort but its 
worth.

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.4.0
>Reporter: Anoop Sam John
>Priority: Critical
> Fix For: 2.0.0
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24850) CellComparator perf improvement

2020-08-10 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-24850:
---
Description: 
We have multiple perf issues in 2.x versions compared to 1.x.  Eg: HBASE-24754, 
HBASE-24637.
The pattern is clear that where ever we do more and more Cell compares, there 
is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
we see much better perf for the PutSortReducer.  (Again the gain is huge 
because of large number of compare ops that test is doing).  This issue is to 
address and optimize compares generally in CellComparatorImpl itself.

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.4.0
>Reporter: Anoop Sam John
>Priority: Critical
> Fix For: 2.0.0
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24850) CellComparator perf improvement

2020-08-10 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-24850:
--

 Summary: CellComparator perf improvement
 Key: HBASE-24850
 URL: https://issues.apache.org/jira/browse/HBASE-24850
 Project: HBase
  Issue Type: Improvement
  Components: Performance, scan
Affects Versions: 2.4.0
Reporter: Anoop Sam John
 Fix For: 2.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24849) Branch-1 Backport : HBASE-24665 MultiWAL : Avoid rolling of ALL WALs when one of the WAL needs a roll

2020-08-10 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175171#comment-17175171
 ] 

Anoop Sam John commented on HBASE-24849:


[~wenfeiyi666]  created this issue for the backport to branch-1 as we had to 
close other jira for 2.3 release RC.
Pls work on the comment on the current PR for branch-1 and raise a new PR with 
this jira title.. Thanks for ur work.

> Branch-1 Backport : HBASE-24665 MultiWAL :  Avoid rolling of ALL WALs when 
> one of the WAL needs a roll
> --
>
> Key: HBASE-24849
> URL: https://issues.apache.org/jira/browse/HBASE-24849
> Project: HBase
>  Issue Type: Bug
>Reporter: Anoop Sam John
>Assignee: wenfeiyi666
>Priority: Major
> Fix For: 1.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24849) Branch-1 Backport : HBASE-24665 MultiWAL : Avoid rolling of ALL WALs when one of the WAL needs a roll

2020-08-10 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-24849:
--

 Summary: Branch-1 Backport : HBASE-24665 MultiWAL :  Avoid rolling 
of ALL WALs when one of the WAL needs a roll
 Key: HBASE-24849
 URL: https://issues.apache.org/jira/browse/HBASE-24849
 Project: HBase
  Issue Type: Bug
Reporter: Anoop Sam John
Assignee: wenfeiyi666
 Fix For: 1.7.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24665) MultiWAL : Avoid rolling of ALL WALs when one of the WAL needs a roll

2020-08-10 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-24665.

Hadoop Flags: Reviewed
  Resolution: Fixed

Pushed to trunk, branch-2, branch-2.3, branch-2.2.. Thanks for the patch 
[~wenfeiyi666]

> MultiWAL :  Avoid rolling of ALL WALs when one of the WAL needs a roll
> --
>
> Key: HBASE-24665
> URL: https://issues.apache.org/jira/browse/HBASE-24665
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 2.3.0, master, 2.1.10, 1.4.14, 2.2.6
>Reporter: wenfeiyi666
>Assignee: wenfeiyi666
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.4.0, 2.2.7
>
>
> when use multiwal, any a wal request roll, all wal will be together roll.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24665) MultiWAL : Avoid rolling of ALL WALs when one of the WAL needs a roll

2020-08-10 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-24665:
---
Fix Version/s: (was: 1.7.0)

> MultiWAL :  Avoid rolling of ALL WALs when one of the WAL needs a roll
> --
>
> Key: HBASE-24665
> URL: https://issues.apache.org/jira/browse/HBASE-24665
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 2.3.0, master, 2.1.10, 1.4.14, 2.2.6
>Reporter: wenfeiyi666
>Assignee: wenfeiyi666
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.4.0, 2.2.7
>
>
> when use multiwal, any a wal request roll, all wal will be together roll.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24754) Bulk load performance is degraded in HBase 2

2020-08-10 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175168#comment-17175168
 ] 

Anoop Sam John commented on HBASE-24754:


We should optimize it at the CellComparatorImpl level itself so that all flows 
can take adv.   This can be an issue in the overall perf issue which deal with 
so many Cells and compares. (The other 2.x perf issue of filtering cells in a 
range scan - HBASE-24637 )
In the initial time of CellComparatorImpl , there were some optimizations and 
so many overloaded compareXXX methods which takes not just Cells but few 
offsets/lengths also.. I think eventually got cleaned up. But such cleanup 
affect perf very much is what we seeing now.
In case of KeyValue the biggest adv is that we know it is a single contiguous 
datastructure backed object and so have ways to parse offset/length with out 
doing back to back decoding of other lengths every time.  In a generic Cell and 
CellComparator such assumptions are not possible.   But normally in HBase most 
of the time, the Cells flowing will be KV or BBKV both backed by  contiguous 
datastructure ..  We can think of having a new interface to mark such Cells and 
a CellComparator impl to take adv of that.  This needs a bigger effort but its 
worth.

> Bulk load performance is degraded in HBase 2 
> -
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 2.2.3
>Reporter: Ajeet Rai
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Attachments: Branc2_withComparator_atKeyValue.patch, 
> Branch1.3_putSortReducer_sampleCode.patch, 
> Branch2_putSortReducer_sampleCode.patch, flamegraph_branch-1_new.svg, 
> flamegraph_branch-2.svg, flamegraph_branch-2_afterpatch.svg
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 186000120150205100068110,1860001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,1
> 3: Cluster: 7 node(2 master+5 Region Server)
>  4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both 
> cluster
>  
> |Feature|HBase 2.2.3
>  Time(Sec)|HBase 1.3.1
>  Time(Sec)|Diff%|Snappy lib:
>   |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
>  HBase 2.2.3: 1.4
>  HBase 1.3.1: 1.4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24665) MultiWAL : Avoid rolling of ALL WALs when one of the WAL needs a roll

2020-08-10 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175153#comment-17175153
 ] 

Anoop Sam John commented on HBASE-24665:


[~ndimiduk]..  This fix gone into 2.3 already..  The jira was not closed 
because of pending branch-1 commit.  Had some issues in that patch..  I will 
close this jira and track that branch-1 as another backport issue. Sounds ok?

> MultiWAL :  Avoid rolling of ALL WALs when one of the WAL needs a roll
> --
>
> Key: HBASE-24665
> URL: https://issues.apache.org/jira/browse/HBASE-24665
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 2.3.0, master, 2.1.10, 1.4.14, 2.2.6
>Reporter: wenfeiyi666
>Assignee: wenfeiyi666
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 2.2.7
>
>
> when use multiwal, any a wal request roll, all wal will be together roll.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-21721) FSHLog : reduce write#syncs() times

2020-08-10 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-21721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-21721:
---
Hadoop Flags: Reviewed
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

Pushed to branch-1 and branch-2.2 also.. Thanks for the patch [~Bo Cui].  

> FSHLog : reduce write#syncs() times
> ---
>
> Key: HBASE-21721
> URL: https://issues.apache.org/jira/browse/HBASE-21721
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.1, 2.1.1, master, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 2.2.6
>
>
> the number of write#syncs can be reduced by updating the 
> highestUnsyncedSequence:
> before write#sync(), get the current highestUnsyncedSequence 
> after write#sync, highestSyncedSequence=highestUnsyncedSequence
>  
> {code:title=FSHLog.java|borderStyle=solid}
> // Some comments here
> public void run()
> {
> long currentSequence;
>   while (!isInterrupted()) {
> int syncCount = 0;
> try {
>   while (true) {
> ...
>   try {
> Trace.addTimelineAnnotation("syncing writer");
> long unSyncedFlushSeq = highestUnsyncedSequence;
> writer.sync();
> Trace.addTimelineAnnotation("writer synced");
> if( unSyncedFlushSeq > currentSequence ) currentSequence = 
> unSyncedFlushSeq;
> currentSequence = updateHighestSyncedSequence(currentSequence);
>   } catch (IOException e) {
> LOG.error("Error syncing, request close of WAL", e);
> lastException = e;
>   } catch (Exception e) {
>...
> }
> }
> {code}
> Add code
>  long unSyncedFlushSeq = highestUnsyncedSequence;
>  if( unSyncedFlushSeq > currentSequence ) currentSequence = unSyncedFlushSeq;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-21721) FSHLog : reduce write#syncs() times

2020-08-07 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-21721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173573#comment-17173573
 ] 

Anoop Sam John commented on HBASE-21721:


Pushed to trunk, branch-2, branch-2.3
[~Bo Cui]  Can u pls raise PR for branch-2.2 as it is not possible to 
cherry-pick to here. Conflicts are coming.  Also this needs a branch-1 PR also.

> FSHLog : reduce write#syncs() times
> ---
>
> Key: HBASE-21721
> URL: https://issues.apache.org/jira/browse/HBASE-21721
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.1, 2.1.1, master, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 2.2.6
>
>
> the number of write#syncs can be reduced by updating the 
> highestUnsyncedSequence:
> before write#sync(), get the current highestUnsyncedSequence 
> after write#sync, highestSyncedSequence=highestUnsyncedSequence
>  
> {code:title=FSHLog.java|borderStyle=solid}
> // Some comments here
> public void run()
> {
> long currentSequence;
>   while (!isInterrupted()) {
> int syncCount = 0;
> try {
>   while (true) {
> ...
>   try {
> Trace.addTimelineAnnotation("syncing writer");
> long unSyncedFlushSeq = highestUnsyncedSequence;
> writer.sync();
> Trace.addTimelineAnnotation("writer synced");
> if( unSyncedFlushSeq > currentSequence ) currentSequence = 
> unSyncedFlushSeq;
> currentSequence = updateHighestSyncedSequence(currentSequence);
>   } catch (IOException e) {
> LOG.error("Error syncing, request close of WAL", e);
> lastException = e;
>   } catch (Exception e) {
>...
> }
> }
> {code}
> Add code
>  long unSyncedFlushSeq = highestUnsyncedSequence;
>  if( unSyncedFlushSeq > currentSequence ) currentSequence = unSyncedFlushSeq;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-21721) FSHLog : reduce write#syncs() times

2020-08-07 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-21721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-21721:
---
Summary: FSHLog : reduce write#syncs() times  (was: reduce write#syncs() 
times)

> FSHLog : reduce write#syncs() times
> ---
>
> Key: HBASE-21721
> URL: https://issues.apache.org/jira/browse/HBASE-21721
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.1, 2.1.1, master, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 2.2.6
>
>
> the number of write#syncs can be reduced by updating the 
> highestUnsyncedSequence:
> before write#sync(), get the current highestUnsyncedSequence 
> after write#sync, highestSyncedSequence=highestUnsyncedSequence
>  
> {code:title=FSHLog.java|borderStyle=solid}
> // Some comments here
> public void run()
> {
> long currentSequence;
>   while (!isInterrupted()) {
> int syncCount = 0;
> try {
>   while (true) {
> ...
>   try {
> Trace.addTimelineAnnotation("syncing writer");
> long unSyncedFlushSeq = highestUnsyncedSequence;
> writer.sync();
> Trace.addTimelineAnnotation("writer synced");
> if( unSyncedFlushSeq > currentSequence ) currentSequence = 
> unSyncedFlushSeq;
> currentSequence = updateHighestSyncedSequence(currentSequence);
>   } catch (IOException e) {
> LOG.error("Error syncing, request close of WAL", e);
> lastException = e;
>   } catch (Exception e) {
>...
> }
> }
> {code}
> Add code
>  long unSyncedFlushSeq = highestUnsyncedSequence;
>  if( unSyncedFlushSeq > currentSequence ) currentSequence = unSyncedFlushSeq;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24753) HA masters based on raft

2020-08-05 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171877#comment-17171877
 ] 

Anoop Sam John commented on HBASE-24753:


Thanks Nick... Ya even what he mentioned also possible.  Clone a cluster.  The 
drop and recreate the cluster based on saved data is a very common thing.  
HBase always use(d) a FS like HDFS or cloud for storing any persistent data.  
This is really great for cloud cases.  Any system which deal with local storage 
(replicated and raft kind of consensus), wont be easy to make it to work in 
cloud.
bq.And even for now, it is not safe to just restart a new cluster with data on 
HDFS but no data on zookeeper. 
Duo, my concern was not on avoid zk usage and all HM have own consensus for 
leader election (As what the jira title says).  My worry was on the line which 
says move the root table data (meta as of today) away from storage but to local 
and HM handle it in special way.   


> HA masters based on raft
> 
>
> Key: HBASE-24753
> URL: https://issues.apache.org/jira/browse/HBASE-24753
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Reporter: Duo Zhang
>Priority: Major
>
> For better availability, for moving bootstrap information from zookeeper to 
> our own service so finally we could remove the dependency on zookeeper 
> completely.
> This has been in my mind for a long time, and since the there is a dicussion 
> in HBASE-11288 about how to storing root table, and also in HBASE-24749, we 
> want to have better performance on a filesystem can not support list and 
> rename well, where requires a storage engine at the bottom to store the 
> storefiles information for meta table, I think it is the time to throw this 
> idea out.
> The basic solution is to build a raft group to store the bootstrap 
> information, for now it is cluster id(it is on the file system already?) and 
> the root table. For region servers they will always go to the leader to ask 
> for the information so they can always see the newest data, and for client, 
> we enable 'follower read', to reduce the load of the leader(and there are 
> some solutions to even let 'follower read' to always get the newest data in 
> raft).
> With this solution in place, as long as root table will not be in a format of 
> region(we could just use rocksdb to store it locally), the cyclic dependency 
> in HBASE-24749 has also been solved, as we do not need to find a place to 
> store the storefiles information for root table any more.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-24791) Improve HFileOutputFormat2 to avoid always call getTableRelativePath method

2020-08-03 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169799#comment-17169799
 ] 

Anoop Sam John edited comment on HBASE-24791 at 8/3/20, 8:04 AM:
-

Pushed to trunk. Thanks for the patch [~chenyechao]. 
Thanks all for the reviews.


was (Author: anoop.hbase):
Pushed to master.

> Improve HFileOutputFormat2 to avoid always call getTableRelativePath method
> ---
>
> Key: HBASE-24791
> URL: https://issues.apache.org/jira/browse/HBASE-24791
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 3.0.0-alpha-1
>Reporter: Yechao Chen
>Assignee: Yechao Chen
>Priority: Critical
>  Labels: HFileOutputFormat, bulkload
> Fix For: 3.0.0-alpha-1
>
>
> Bulkload use HFileOutputFormat2 to write HFile 
> In the  HFileOutputFormat2.RecordWriter
> in the write method always called the getTableRelativePath method each time
> This  is unnecessary 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24791) Improve HFileOutputFormat2 to avoid always call getTableRelativePath method

2020-08-03 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-24791.

Hadoop Flags: Reviewed
  Resolution: Fixed

Pushed to master.

> Improve HFileOutputFormat2 to avoid always call getTableRelativePath method
> ---
>
> Key: HBASE-24791
> URL: https://issues.apache.org/jira/browse/HBASE-24791
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 3.0.0-alpha-1
>Reporter: Yechao Chen
>Assignee: Yechao Chen
>Priority: Critical
>  Labels: HFileOutputFormat, bulkload
> Fix For: 3.0.0-alpha-1
>
>
> Bulkload use HFileOutputFormat2 to write HFile 
> In the  HFileOutputFormat2.RecordWriter
> in the write method always called the getTableRelativePath method each time
> This  is unnecessary 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-21721) reduce write#syncs() times

2020-08-02 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-21721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John reassigned HBASE-21721:
--

Assignee: Bo Cui

> reduce write#syncs() times
> --
>
> Key: HBASE-21721
> URL: https://issues.apache.org/jira/browse/HBASE-21721
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.1, 2.1.1, master, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
>
> the number of write#syncs can be reduced by updating the 
> highestUnsyncedSequence:
> before write#sync(), get the current highestUnsyncedSequence 
> after write#sync, highestSyncedSequence=highestUnsyncedSequence
>  
> {code:title=FSHLog.java|borderStyle=solid}
> // Some comments here
> public void run()
> {
> long currentSequence;
>   while (!isInterrupted()) {
> int syncCount = 0;
> try {
>   while (true) {
> ...
>   try {
> Trace.addTimelineAnnotation("syncing writer");
> long unSyncedFlushSeq = highestUnsyncedSequence;
> writer.sync();
> Trace.addTimelineAnnotation("writer synced");
> if( unSyncedFlushSeq > currentSequence ) currentSequence = 
> unSyncedFlushSeq;
> currentSequence = updateHighestSyncedSequence(currentSequence);
>   } catch (IOException e) {
> LOG.error("Error syncing, request close of WAL", e);
> lastException = e;
>   } catch (Exception e) {
>...
> }
> }
> {code}
> Add code
>  long unSyncedFlushSeq = highestUnsyncedSequence;
>  if( unSyncedFlushSeq > currentSequence ) currentSequence = unSyncedFlushSeq;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-21721) reduce write#syncs() times

2020-08-02 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-21721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-21721:
---
Fix Version/s: 2.2.6
   2.4.0
   1.7.0
   2.3.1
   3.0.0-alpha-1

> reduce write#syncs() times
> --
>
> Key: HBASE-21721
> URL: https://issues.apache.org/jira/browse/HBASE-21721
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.1, 2.1.1, master, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 2.2.6
>
>
> the number of write#syncs can be reduced by updating the 
> highestUnsyncedSequence:
> before write#sync(), get the current highestUnsyncedSequence 
> after write#sync, highestSyncedSequence=highestUnsyncedSequence
>  
> {code:title=FSHLog.java|borderStyle=solid}
> // Some comments here
> public void run()
> {
> long currentSequence;
>   while (!isInterrupted()) {
> int syncCount = 0;
> try {
>   while (true) {
> ...
>   try {
> Trace.addTimelineAnnotation("syncing writer");
> long unSyncedFlushSeq = highestUnsyncedSequence;
> writer.sync();
> Trace.addTimelineAnnotation("writer synced");
> if( unSyncedFlushSeq > currentSequence ) currentSequence = 
> unSyncedFlushSeq;
> currentSequence = updateHighestSyncedSequence(currentSequence);
>   } catch (IOException e) {
> LOG.error("Error syncing, request close of WAL", e);
> lastException = e;
>   } catch (Exception e) {
>...
> }
> }
> {code}
> Add code
>  long unSyncedFlushSeq = highestUnsyncedSequence;
>  if( unSyncedFlushSeq > currentSequence ) currentSequence = unSyncedFlushSeq;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24713) RS startup with FSHLog throws NPE after HBASE-21751

2020-08-02 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169726#comment-17169726
 ] 

Anoop Sam John commented on HBASE-24713:


branch-2?

> RS startup with FSHLog throws NPE after HBASE-21751
> ---
>
> Key: HBASE-24713
> URL: https://issues.apache.org/jira/browse/HBASE-24713
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 2.1.6
>Reporter: ramkrishna.s.vasudevan
>Assignee: Gaurav Kanade
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.2.6
>
>
> Every RS startup creates this NPE
> {code}
> [sync.1] wal.FSHLog: UNEXPECTED
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:582)
> at java.lang.Thread.run(Thread.java:748)
> 2020-07-07 10:51:23,208 WARN  [regionserver/x:16020] wal.FSHLog: Failed 
> sync-before-close but no outstanding appends; closing 
> WALjava.lang.NullPointerException
> {code}
> the reason is that the Disruptor frameworks starts the Syncrunner thread but 
> the init of the writer happens after that. A simple null check in the 
> Syncrunner will help here .
> No major damage happens though since we handle Throwable Exception. It will 
> good to solve this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24800) Enhance ACL region initialization

2020-08-02 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169698#comment-17169698
 ] 

Anoop Sam John commented on HBASE-24800:


60K table? !
So here zk level call, optimization is proposed. I have 2 other aspects to 
mention.
- We use zk as a way to notify all RSs to update their cache of the table 
permission details when the ACL table is updated. But we end up writing all ACL 
table detail into zk also. With these many tables like cases, its lot of data 
in the zk.  Should we think of a way where we can just inform about the ACL 
content change to RSs and each RS read the latest changed content from ACL 
region in order to update its cache? This read can be time range based if every 
RS track the latest TS of its local ACL cache content.
- Per RS we keep the ACL detail cached in RS. How big this is growing in ur 
cases with these many tables? Per RS you might have regions from so many tables 
also right?  Even for the local cache we need some cap for the heap usage?

> Enhance ACL region initialization
> -
>
> Key: HBASE-24800
> URL: https://issues.apache.org/jira/browse/HBASE-24800
> Project: HBase
>  Issue Type: Improvement
>  Components: acl, MTTR
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Major
>
> RegionServer persist ACL table entries into Zookeeper during ACL region open,
> {code}
>   private void initialize(RegionCoprocessorEnvironment e) throws 
> IOException {
> final Region region = e.getRegion();
> Configuration conf = e.getConfiguration();
> Map> tables = 
> PermissionStorage.loadAll(region);
> // For each table, write out the table's permissions to the respective
> // znode for that table.
> for (Map.Entry> t:
>   tables.entrySet()) {
>   byte[] entry = t.getKey();
>   ListMultimap perms = t.getValue();
>   byte[] serialized = PermissionStorage.writePermissionsAsBytes(perms, 
> conf);
>   zkPermissionWatcher.writeToZookeeper(entry, serialized);
> }
> initialized = true;
>   }
> {code}
> Currently RegionServer send 2 RPC (one to create the table path and another 
> to set the data) for each table sequentially.
> {code}
>  try {
>   ZKUtil.createWithParents(watcher, zkNode);
>   ZKUtil.updateExistingNodeData(watcher, zkNode, permsData, -1);
> } catch (KeeperException e) {
>   LOG.error("Failed updating permissions for entry '" +
>   entryName + "'", e);
>   watcher.abort("Failed writing node "+zkNode+" to zookeeper", e);
> }
> {code}
> If a cluster have huge number of tables then ACL region open will take time. 
> Example, it took ~9 min to write 60k tables ACL into ZK. 
> We should send ZK Ops in a single multi() to enhance this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24695) FSHLog - close the current WAL file in a background thread

2020-08-01 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-24695.

Hadoop Flags: Reviewed
  Resolution: Fixed

Pushed to branch-2 and trunk.  Thanks for the reviews Duo and Ram.

> FSHLog - close the current WAL file in a background thread
> --
>
> Key: HBASE-24695
> URL: https://issues.apache.org/jira/browse/HBASE-24695
> Project: HBase
>  Issue Type: Improvement
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> We have this as a TODO in code already
> {code}
> // It is at the safe point. Swap out writer from under the blocked writer 
> thread.
>   // TODO: This is close is inline with critical section. Should happen 
> in background?
>   if (this.writer != null) {
> oldFileLen = this.writer.getLength();
> try {
>   TraceUtil.addTimelineAnnotation("closing writer");
>   this.writer.close();
>   TraceUtil.addTimelineAnnotation("writer closed");
>   this.closeErrorCount.set(0);
> }
> {code}
> This close call in critical section and writes are blocked. Lets move this 
> close call into another WALCloser thread. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-21721) reduce write#syncs() times

2020-08-01 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-21721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169286#comment-17169286
 ] 

Anoop Sam John commented on HBASE-21721:


[~Bo Cui] The PR might not apply any more.. Can u pls raise new PR/ Thanks.. 
Will commit once new one is in and QA is green.

> reduce write#syncs() times
> --
>
> Key: HBASE-21721
> URL: https://issues.apache.org/jira/browse/HBASE-21721
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.1, 2.1.1, master, 2.2.3
>Reporter: Bo Cui
>Priority: Major
>
> the number of write#syncs can be reduced by updating the 
> highestUnsyncedSequence:
> before write#sync(), get the current highestUnsyncedSequence 
> after write#sync, highestSyncedSequence=highestUnsyncedSequence
>  
> {code:title=FSHLog.java|borderStyle=solid}
> // Some comments here
> public void run()
> {
> long currentSequence;
>   while (!isInterrupted()) {
> int syncCount = 0;
> try {
>   while (true) {
> ...
>   try {
> Trace.addTimelineAnnotation("syncing writer");
> long unSyncedFlushSeq = highestUnsyncedSequence;
> writer.sync();
> Trace.addTimelineAnnotation("writer synced");
> if( unSyncedFlushSeq > currentSequence ) currentSequence = 
> unSyncedFlushSeq;
> currentSequence = updateHighestSyncedSequence(currentSequence);
>   } catch (IOException e) {
> LOG.error("Error syncing, request close of WAL", e);
> lastException = e;
>   } catch (Exception e) {
>...
> }
> }
> {code}
> Add code
>  long unSyncedFlushSeq = highestUnsyncedSequence;
>  if( unSyncedFlushSeq > currentSequence ) currentSequence = unSyncedFlushSeq;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24695) FSHLog - close the current WAL file in a background thread

2020-08-01 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-24695:
---
Fix Version/s: 2.4.0
   3.0.0-alpha-1

> FSHLog - close the current WAL file in a background thread
> --
>
> Key: HBASE-24695
> URL: https://issues.apache.org/jira/browse/HBASE-24695
> Project: HBase
>  Issue Type: Improvement
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> We have this as a TODO in code already
> {code}
> // It is at the safe point. Swap out writer from under the blocked writer 
> thread.
>   // TODO: This is close is inline with critical section. Should happen 
> in background?
>   if (this.writer != null) {
> oldFileLen = this.writer.getLength();
> try {
>   TraceUtil.addTimelineAnnotation("closing writer");
>   this.writer.close();
>   TraceUtil.addTimelineAnnotation("writer closed");
>   this.closeErrorCount.set(0);
> }
> {code}
> This close call in critical section and writes are blocked. Lets move this 
> close call into another WALCloser thread. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-07-30 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167671#comment-17167671
 ] 

Anoop Sam John commented on HBASE-18070:


Was thinking on this and got to same statement as Andy said above.  Its just 
like client side meta location cache which can be wrong.  Even in a double 
assign kind of issue, the meta cache also can cause this issue today.  Thought 
of n/w partition. But all such possible issues, even today it might happen.
Also we will use the replica reads only for the region location right?  Not for 
other stuff like the table state/NS data etc.
I think then we are good.
Will wait for the design doc [~huaxiangsun]

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Major
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24791) Improve HFileOutputFormat2 to avoid always call getTableRelativePath method

2020-07-29 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167198#comment-17167198
 ] 

Anoop Sam John commented on HBASE-24791:


Checking further, HBASE-17825 added this code and so it is applicable only to 
master.  Thanks [~pankajkumar] for checking 2.2 for this code path

> Improve HFileOutputFormat2 to avoid always call getTableRelativePath method
> ---
>
> Key: HBASE-24791
> URL: https://issues.apache.org/jira/browse/HBASE-24791
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 3.0.0-alpha-1
>Reporter: Yechao Chen
>Assignee: Yechao Chen
>Priority: Critical
>  Labels: HFileOutputFormat, bulkload
> Fix For: 3.0.0-alpha-1
>
>
> Bulkload use HFileOutputFormat2 to write HFile 
> In the  HFileOutputFormat2.RecordWriter
> in the write method always called the getTableRelativePath method each time
> This  is unnecessary 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24791) Improve HFileOutputFormat2 to avoid always call getTableRelativePath method

2020-07-29 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-24791:
---
Fix Version/s: (was: 2.2.6)
   (was: 2.4.0)
   (was: 2.3.1)

> Improve HFileOutputFormat2 to avoid always call getTableRelativePath method
> ---
>
> Key: HBASE-24791
> URL: https://issues.apache.org/jira/browse/HBASE-24791
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 3.0.0-alpha-1
>Reporter: Yechao Chen
>Assignee: Yechao Chen
>Priority: Critical
>  Labels: HFileOutputFormat, bulkload
> Fix For: 3.0.0-alpha-1
>
>
> Bulkload use HFileOutputFormat2 to write HFile 
> In the  HFileOutputFormat2.RecordWriter
> in the write method always called the getTableRelativePath method each time
> This  is unnecessary 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24791) Improve HFileOutputFormat2 to avoid always call getTableRelativePath method

2020-07-29 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-24791:
---
Affects Version/s: (was: 2.0.0)
   3.0.0-alpha-1

> Improve HFileOutputFormat2 to avoid always call getTableRelativePath method
> ---
>
> Key: HBASE-24791
> URL: https://issues.apache.org/jira/browse/HBASE-24791
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 3.0.0-alpha-1
>Reporter: Yechao Chen
>Assignee: Yechao Chen
>Priority: Critical
>  Labels: HFileOutputFormat, bulkload
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.4.0, 2.2.6
>
>
> Bulkload use HFileOutputFormat2 to write HFile 
> In the  HFileOutputFormat2.RecordWriter
> in the write method always called the getTableRelativePath method each time
> This  is unnecessary 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24790) Remove unused counter from SplitLogCounters

2020-07-29 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-24790.

Hadoop Flags: Reviewed
  Resolution: Fixed

Pushed to master. Thanks for the patch [~chenyechao]

> Remove unused counter from SplitLogCounters
> ---
>
> Key: HBASE-24790
> URL: https://issues.apache.org/jira/browse/HBASE-24790
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Yechao Chen
>Assignee: Yechao Chen
>Priority: Minor
> Fix For: 3.0.0-alpha-1
>
>
> remove unused counter from SplitLogCounters



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24790) Remove unused counter from SplitLogCounters

2020-07-29 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-24790:
---
Fix Version/s: 3.0.0-alpha-1

> Remove unused counter from SplitLogCounters
> ---
>
> Key: HBASE-24790
> URL: https://issues.apache.org/jira/browse/HBASE-24790
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Yechao Chen
>Assignee: Yechao Chen
>Priority: Minor
> Fix For: 3.0.0-alpha-1
>
>
> remove unused counter from SplitLogCounters



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24791) Improve HFileOutputFormat2 to avoid always call getTableRelativePath method

2020-07-29 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-24791:
---
Priority: Critical  (was: Major)

> Improve HFileOutputFormat2 to avoid always call getTableRelativePath method
> ---
>
> Key: HBASE-24791
> URL: https://issues.apache.org/jira/browse/HBASE-24791
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0
>Reporter: Yechao Chen
>Assignee: Yechao Chen
>Priority: Critical
>  Labels: HFileOutputFormat, bulkload
>
> Bulkload use HFileOutputFormat2 to write HFile 
> In the  HFileOutputFormat2.RecordWriter
> in the write method always called the getTableRelativePath method each time
> This  is unnecessary 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24754) Bulk load performance is degraded in HBase 2

2020-07-29 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167048#comment-17167048
 ] 

Anoop Sam John commented on HBASE-24754:


Ya agree to [~chenyechao].  [~a00408367]  can u test with the patch in 
HBASE-24791 ?

> Bulk load performance is degraded in HBase 2 
> -
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 2.2.3
>Reporter: Ajeet Rai
>Priority: Major
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 186000120150205100068110,1860001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,1
> 3: Cluster: 7 node(2 master+5 Region Server)
>  4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both 
> cluster
>  
> |Feature|HBase 2.2.3
>  Time(Sec)|HBase 1.3.1
>  Time(Sec)|Diff%|Snappy lib:
>   |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
>  HBase 2.2.3: 1.4
>  HBase 1.3.1: 1.4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24791) Improve HFileOutputFormat2 to avoid always call getTableRelativePath method

2020-07-29 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-24791:
---
Affects Version/s: (was: 2.2.5)
   (was: 2.3.0)
   2.0.0

> Improve HFileOutputFormat2 to avoid always call getTableRelativePath method
> ---
>
> Key: HBASE-24791
> URL: https://issues.apache.org/jira/browse/HBASE-24791
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0
>Reporter: Yechao Chen
>Assignee: Yechao Chen
>Priority: Major
>  Labels: HFileOutputFormat, bulkload
>
> Bulkload use HFileOutputFormat2 to write HFile 
> In the  HFileOutputFormat2.RecordWriter
> in the write method always called the getTableRelativePath method each time
> This  is unnecessary 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24791) Improve HFileOutputFormat2 to avoid always call getTableRelativePath method

2020-07-29 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167050#comment-17167050
 ] 

Anoop Sam John commented on HBASE-24791:


How many columns in ur test?  If that is large, the impact will be so huge 
which u r seeing.  I think this should be marked as a bugfix as it was a perf 
regression in 2.0

> Improve HFileOutputFormat2 to avoid always call getTableRelativePath method
> ---
>
> Key: HBASE-24791
> URL: https://issues.apache.org/jira/browse/HBASE-24791
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0
>Reporter: Yechao Chen
>Assignee: Yechao Chen
>Priority: Critical
>  Labels: HFileOutputFormat, bulkload
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.4.0, 2.2.6
>
>
> Bulkload use HFileOutputFormat2 to write HFile 
> In the  HFileOutputFormat2.RecordWriter
> in the write method always called the getTableRelativePath method each time
> This  is unnecessary 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24791) Improve HFileOutputFormat2 to avoid always call getTableRelativePath method

2020-07-29 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-24791:
---
Fix Version/s: 2.2.6
   2.4.0
   2.3.1
   3.0.0-alpha-1

> Improve HFileOutputFormat2 to avoid always call getTableRelativePath method
> ---
>
> Key: HBASE-24791
> URL: https://issues.apache.org/jira/browse/HBASE-24791
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0
>Reporter: Yechao Chen
>Assignee: Yechao Chen
>Priority: Critical
>  Labels: HFileOutputFormat, bulkload
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.4.0, 2.2.6
>
>
> Bulkload use HFileOutputFormat2 to write HFile 
> In the  HFileOutputFormat2.RecordWriter
> in the write method always called the getTableRelativePath method each time
> This  is unnecessary 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24665) MultiWAL : Avoid rolling of ALL WALs when one of the WAL needs a roll

2020-07-28 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166874#comment-17166874
 ] 

Anoop Sam John commented on HBASE-24665:


[~wenfeiyi666], There is an issue with walRollFinished() API which we discussed 
in the PR review.  Can u pls raise a jira to track that too?

> MultiWAL :  Avoid rolling of ALL WALs when one of the WAL needs a roll
> --
>
> Key: HBASE-24665
> URL: https://issues.apache.org/jira/browse/HBASE-24665
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 2.3.0, master, 2.1.10, 1.4.14, 2.2.6
>Reporter: wenfeiyi666
>Assignee: wenfeiyi666
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 2.2.7
>
>
> when use multiwal, any a wal request roll, all wal will be together roll.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24753) HA masters based on raft

2020-07-27 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166154#comment-17166154
 ] 

Anoop Sam John commented on HBASE-24753:


bq.With this solution in place, as long as root table will not be in a format 
of region(we could just use rocksdb to store it locally), 
There is one interesting usecase by Cloud to drop a cluster and recreate it 
later on existing data.  This was/is possible because we never store any 
persisting data locally but always on FS. I would say lets not break that. I 
read in another jira also says that the Root table data can be stored locally 
(RAFT will be in place) not on FS.  I would say lets not do that.  Let us 
continue to have the storage isolation.

> HA masters based on raft
> 
>
> Key: HBASE-24753
> URL: https://issues.apache.org/jira/browse/HBASE-24753
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Reporter: Duo Zhang
>Priority: Major
>
> For better availability, for moving bootstrap information from zookeeper to 
> our own service so finally we could remove the dependency on zookeeper 
> completely.
> This has been in my mind for a long time, and since the there is a dicussion 
> in HBASE-11288 about how to storing root table, and also in HBASE-24749, we 
> want to have better performance on a filesystem can not support list and 
> rename well, where requires a storage engine at the bottom to store the 
> storefiles information for meta table, I think it is the time to throw this 
> idea out.
> The basic solution is to build a raft group to store the bootstrap 
> information, for now it is cluster id(it is on the file system already?) and 
> the root table. For region servers they will always go to the leader to ask 
> for the information so they can always see the newest data, and for client, 
> we enable 'follower read', to reduce the load of the leader(and there are 
> some solutions to even let 'follower read' to always get the newest data in 
> raft).
> With this solution in place, as long as root table will not be in a format of 
> region(we could just use rocksdb to store it locally), the cyclic dependency 
> in HBASE-24749 has also been solved, as we do not need to find a place to 
> store the storefiles information for root table any more.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >