[jira] [Created] (HBASE-14448) Refine RegionGroupingProvider Phase-2: remove provider nesting and formalize wal group name

2015-09-16 Thread Yu Li (JIRA)
Yu Li created HBASE-14448:
-

 Summary: Refine RegionGroupingProvider Phase-2: remove provider 
nesting and formalize wal group name
 Key: HBASE-14448
 URL: https://issues.apache.org/jira/browse/HBASE-14448
 Project: HBase
  Issue Type: Improvement
Reporter: Yu Li
Assignee: Yu Li


Now we are nesting DefaultWALProvider inside RegionGroupingProvider, which 
makes the logic ambiguous since a "provider" itself should provide logs. 
Suggest to directly instantiate FSHlog in RegionGroupingProvider.

W.r.t wal group name, now in RegionGroupingProvider it's using sth like 
"-null-" which is quite long and unnecessary. Suggest to 
directly use ".".

For more details, please refer to the initial patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14411) Fix unit test failures when using multiwal as default WAL provider

2015-09-16 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791650#comment-14791650
 ] 

Yu Li commented on HBASE-14411:
---

>From the [testReport | 
>https://builds.apache.org/job/PreCommit-HBASE-Build/15614//testReport/org.apache.hadoop.hbase.regionserver/TestWALLockup/testLockupWhenSyncInMiddleOfZigZagSetup/],
> failure of the case should be caused by intermittent env issue, below is the 
>exception thrown in TestWALLockup:
{noformat}
Caused by: java.io.IOException: FAKE! Failed to replace a bad datanode...APPEND
at 
org.apache.hadoop.hbase.regionserver.TestWALLockup$1DodgyFSLog$1.append(TestWALLockup.java:173)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.append(FSHLog.java:1880)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1748)
{noformat}

Thanks [~eclark] for the attention, and [~tedyu] for help taking a look.

> Fix unit test failures when using multiwal as default WAL provider
> --
>
> Key: HBASE-14411
> URL: https://issues.apache.org/jira/browse/HBASE-14411
> Project: HBase
>  Issue Type: Bug
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14411.branch-1.patch, HBASE-14411.patch, 
> HBASE-14411_v2.patch
>
>
> If we set hbase.wal.provider to multiwal in 
> hbase-server/src/test/resources/hbase-site.xml which allows us to use 
> BoundedRegionGroupingProvider in UT, we will observe below failures in 
> current code base:
> {noformat}
> Failed tests:
>   TestHLogRecordReader>TestWALRecordReader.testPartialRead:164 expected:<1> 
> but was:<2>
>   TestHLogRecordReader>TestWALRecordReader.testWALRecordReader:216 
> expected:<2> but was:<3>
>   TestWALRecordReader.testPartialRead:164 expected:<1> but was:<2>
>   TestWALRecordReader.testWALRecordReader:216 expected:<2> but was:<3>
>   TestDistributedLogSplitting.testRecoveredEdits:276 edits dir should have 
> more than a single file in it. instead has 1
>   TestAtomicOperation.testMultiRowMutationMultiThreads:499 expected:<0> but 
> was:<1>
>   TestHRegionServerBulkLoad.testAtomicBulkLoad:307
> Expected: is 
>  but: was 
>   TestLogRolling.testCompactionRecordDoesntBlockRolling:611 Should have WAL; 
> one table is not flushed expected:<1> but was:<0>
>   TestLogRolling.testLogRollOnDatanodeDeath:359 null
>   TestLogRolling.testLogRollOnPipelineRestart:472 Missing datanode should've 
> triggered a log roll
>   TestReplicationSourceManager.testLogRoll:237 expected:<6> but was:<7>
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594 if 
> skip.errors is false all files should remain in place expected:<11> but 
> was:<12>
>   TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong number of files in the 
> archive log expected:<11> but was:<12>
>   TestWALSplit.testMovedWALDuringRecovery:810->retryOverHdfsProblem:793 
> expected:<11> but was:<12>
>   TestWALSplit.testRetryOpenDuringRecovery:838->retryOverHdfsProblem:793 
> expected:<11> but was:<12>
>   
> TestWALSplitCompressed>TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594
>  if skip.errors is false all files should remain in place expected:<11> but 
> was:<12>
>   TestWALSplitCompressed>TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong 
> number of files in the archive log expected:<11> but was:<12>
>   
> TestWALSplitCompressed>TestWALSplit.testMovedWALDuringRecovery:810->TestWALSplit.retryOverHdfsProblem:793
>  expected:<11> but was:<12>
>   
> TestWALSplitCompressed>TestWALSplit.testRetryOpenDuringRecovery:838->TestWALSplit.retryOverHdfsProblem:793
>  expected:<11> but was:<12>
> {noformat}
> While patch for HBASE-14306 could resolve failures of TestHLogRecordReader, 
> TestReplicationSourceManager and TestReplicationWALReaderManager, this JIRA 
> will focus on resolving the others



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791641#comment-14791641
 ] 

Hudson commented on HBASE-14082:


SUCCESS: Integrated in HBase-1.2-IT #153 (See 
[https://builds.apache.org/job/HBase-1.2-IT/153/])
HBASE-14082 Add replica id to JMX metrics names (Lei Chen) (enis: rev 
9f420d0ac6175a7245efe68c27fc32458bca1b86)
* 
hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionWrapperImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java
* 
hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionWrapper.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/MetricsRegionWrapperStub.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegion.java
* 
hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSource.java


> Add replica id to JMX metrics names
> ---
>
> Key: HBASE-14082
> URL: https://issues.apache.org/jira/browse/HBASE-14082
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Lei Chen
>Assignee: Lei Chen
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14082-v6.patch, HBASE-14082-v1.patch, 
> HBASE-14082-v2.patch, HBASE-14082-v3.patch, HBASE-14082-v4.patch, 
> HBASE-14082-v5.patch
>
>
> Today, via JMX, one cannot distinguish a primary region from a replica. A 
> possible solution is to add replica id to JMX metrics names. The benefits may 
> include, for example:
> # Knowing the latency of a read request on a replica region means the first 
> attempt to the primary region has timeout.
> # Write requests on replicas are due to the replication process, while the 
> ones on primary are from clients.
> # In case of looking for hot spots of read operations, replicas should be 
> excluded since TIMELINE reads are sent to all replicas.
> To implement, we can change the format of metrics names found at 
> {code}Hadoop->HBase->RegionServer->Regions->Attributes{code}
> from 
> {code}namespace__table__region__metric_{code}
> to
> {code}namespace__table__region__replicaid__metric_{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14278) Fix NPE that is showing up since HBASE-14274 went in

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791628#comment-14791628
 ] 

Hudson commented on HBASE-14278:


FAILURE: Integrated in HBase-TRUNK #6817 (See 
[https://builds.apache.org/job/HBase-TRUNK/6817/])
HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev 
c1ac4bb8601f88eb3fe246eb62c3f40e95faf93d)
* 
hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java


> Fix NPE that is showing up since HBASE-14274 went in
> 
>
> Key: HBASE-14278
> URL: https://issues.apache.org/jira/browse/HBASE-14278
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: stack
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14278-v1.patch, HBASE-14278-v2.patch, 
> HBASE-14278-v3.patch, HBASE-14278-v4.patch, HBASE-14278-v5.patch, 
> HBASE-14278.patch
>
>
> Saw this in TestDistributedLogSplitting after HBASE-14274 was applied.
> {code}
> 119113 2015-08-20 15:31:10,704 WARN  [HBase-Metrics2-1] 
> impl.MetricsConfig(124): Cannot locate configuration: tried 
> hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
> 119114 2015-08-20 15:31:10,710 ERROR [HBase-Metrics2-1] 
> lib.MethodMetric$2(118): Error invoking method getBlocksTotal
> 119115 java.lang.reflect.InvocationTargetException
> 119116 ›   at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source)
> 119117 ›   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 119118 ›   at java.lang.reflect.Method.invoke(Method.java:606)
> 119119 ›   at 
> org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111)
> 119120 ›   at 
> org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144)
> 119121 ›   at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:387)
> 119122 ›   at 
> org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79)
> 119123 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195)
> 119124 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172)
> 119125 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151)
> 119126 ›   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333)
> 119127 ›   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319)
> 119128 ›   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
> 119129 ›   at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57)
> 119130 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:221)
> 119131 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:96)
> 119132 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:245)
> 119133 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:229)
> 119134 ›   at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source)
> 119135 ›   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 119136 ›   at java.lang.reflect.Method.invoke(Method.java:606)
> 119137 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:290)
> 119138 ›   at com.sun.proxy.$Proxy13.postStart(Unknown Source)
> 119139 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:185)
> 119140 ›   at 
> org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:81)
> 119141 ›   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 119142 ›   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 119143 ›   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> 119144 ›   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.j

[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791629#comment-14791629
 ] 

Hudson commented on HBASE-14082:


FAILURE: Integrated in HBase-TRUNK #6817 (See 
[https://builds.apache.org/job/HBase-TRUNK/6817/])
HBASE-14082 Add replica id to JMX metrics names (Lei Chen) (enis: rev 
17bdf9fa8cbe920578c09c38960dd0450746fe5c)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionWrapperImpl.java
* 
hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionWrapper.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegion.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java
* 
hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSource.java
* 
hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/MetricsRegionWrapperStub.java


> Add replica id to JMX metrics names
> ---
>
> Key: HBASE-14082
> URL: https://issues.apache.org/jira/browse/HBASE-14082
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Lei Chen
>Assignee: Lei Chen
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14082-v6.patch, HBASE-14082-v1.patch, 
> HBASE-14082-v2.patch, HBASE-14082-v3.patch, HBASE-14082-v4.patch, 
> HBASE-14082-v5.patch
>
>
> Today, via JMX, one cannot distinguish a primary region from a replica. A 
> possible solution is to add replica id to JMX metrics names. The benefits may 
> include, for example:
> # Knowing the latency of a read request on a replica region means the first 
> attempt to the primary region has timeout.
> # Write requests on replicas are due to the replication process, while the 
> ones on primary are from clients.
> # In case of looking for hot spots of read operations, replicas should be 
> excluded since TIMELINE reads are sent to all replicas.
> To implement, we can change the format of metrics names found at 
> {code}Hadoop->HBase->RegionServer->Regions->Attributes{code}
> from 
> {code}namespace__table__region__metric_{code}
> to
> {code}namespace__table__region__replicaid__metric_{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791630#comment-14791630
 ] 

Hudson commented on HBASE-14274:


FAILURE: Integrated in HBase-TRUNK #6817 (See 
[https://builds.apache.org/job/HBase-TRUNK/6817/])
HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev 
c1ac4bb8601f88eb3fe246eb62c3f40e95faf93d)
* 
hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java


> Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs 
> MetricsRegionAggregateSourceImpl
> ---
>
> Key: HBASE-14274
> URL: https://issues.apache.org/jira/browse/HBASE-14274
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: stack
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, 
> HBASE-14274.patch
>
>
> Looking into parent issue, got a hang locally of TestDistributedLogReplay.
> We have region closes here:
> {code}
> "RS_CLOSE_META-localhost:59610-0" prio=5 tid=0x7ff65c03f800 nid=0x54347 
> waiting on condition [0x00011f7ac000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00075636d8c0> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78)
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120)
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500)
>   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344)
>   - locked <0x0007ff878190> (a java.lang.Object)
>   at 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102)
>   at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> {code}
> They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to 
> get a write lock on this classes local ReentrantReadWriteLock while holding 
> MetricsRegionSourceImpl's readWriteLock write lock.
> Then, elsewhere the JmxCacheBuster is running trying to get metrics with 
> above locks held in reverse:
> {code}
> "HBase-Metrics2-1" daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting 
> on condition [0x000140ea5000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007cade1480> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193)
>   at 
> org.apache.hadoop.hb

[jira] [Commented] (HBASE-13250) chown of ExportSnapshot does not cover all path and files

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791615#comment-14791615
 ] 

Hudson commented on HBASE-13250:


FAILURE: Integrated in HBase-0.98 #1125 (See 
[https://builds.apache.org/job/HBase-0.98/1125/])
HBASE-13250 Revert due to compilation error against hadoop-1 profile (tedyu: 
rev 38995fbd51ac4735b673dd1527cb2631b69b7474)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java
HBASE-13250 chown of ExportSnapshot does not cover all path and files (He 
Liangliang) (tedyu: rev 88a620892883ac878bde3ea3c64c7275600b7085)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java


> chown of ExportSnapshot does not cover all path and files
> -
>
> Key: HBASE-13250
> URL: https://issues.apache.org/jira/browse/HBASE-13250
> Project: HBase
>  Issue Type: Bug
>Reporter: He Liangliang
>Assignee: He Liangliang
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: 13250-0.98-v2.txt, HBASE-13250-V0.patch
>
>
> The chuser/chgroup function only covers the leaf hfile. The ownership of 
> hfile parent paths and snapshot reference files are not changed as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12751) Allow RowLock to be reader writer

2015-09-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791603#comment-14791603
 ] 

stack commented on HBASE-12751:
---

Dang. The hangs are legit and reproducible. Will be back after try and figure 
the why.

> Allow RowLock to be reader writer
> -
>
> Key: HBASE-12751
> URL: https://issues.apache.org/jira/browse/HBASE-12751
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.3.0
>
> Attachments: 12751.rebased.v25.txt, 12751.rebased.v26.txt, 
> 12751.rebased.v26.txt, 12751.rebased.v27.txt, 12751.rebased.v29.txt, 
> 12751.rebased.v31.txt, 12751.rebased.v32.txt, 12751.rebased.v32.txt, 
> 12751.rebased.v33.txt, 12751.rebased.v34.txt, 12751.rebased.v35.txt, 
> 12751.rebased.v35.txt, 12751.rebased.v35.txt, 12751.v37.txt, 12751.v38.txt, 
> 12751v22.txt, 12751v23.txt, 12751v23.txt, 12751v23.txt, 12751v23.txt, 
> 12751v36.txt, HBASE-12751-v1.patch, HBASE-12751-v10.patch, 
> HBASE-12751-v10.patch, HBASE-12751-v11.patch, HBASE-12751-v12.patch, 
> HBASE-12751-v13.patch, HBASE-12751-v14.patch, HBASE-12751-v15.patch, 
> HBASE-12751-v16.patch, HBASE-12751-v17.patch, HBASE-12751-v18.patch, 
> HBASE-12751-v19 (1).patch, HBASE-12751-v19.patch, HBASE-12751-v2.patch, 
> HBASE-12751-v20.patch, HBASE-12751-v20.patch, HBASE-12751-v21.patch, 
> HBASE-12751-v3.patch, HBASE-12751-v4.patch, HBASE-12751-v5.patch, 
> HBASE-12751-v6.patch, HBASE-12751-v7.patch, HBASE-12751-v8.patch, 
> HBASE-12751-v9.patch, HBASE-12751.patch
>
>
> Right now every write operation grabs a row lock. This is to prevent values 
> from changing during a read modify write operation (increment or check and 
> put). However it limits parallelism in several different scenarios.
> If there are several puts to the same row but different columns or stores 
> then this is very limiting.
> If there are puts to the same column then mvcc number should ensure a 
> consistent ordering. So locking is not needed.
> However locking for check and put or increment is still needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11590) use a specific ThreadPoolExecutor

2015-09-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791601#comment-14791601
 ] 

stack commented on HBASE-11590:
---

Should we down the keepalive timeout so it is seconds only?  We have 
allowCoreThreadTimeOut(true);  Core threads would run up to the max but could 
also go down to zero as is noted in 
http://stackoverflow.com/questions/19528304/how-to-get-the-threadpoolexecutor-to-increase-threads-to-max-before-queueing/19528305#19528305
  Or the suggestion by Ralph H at answered Oct 23 '13 at 10:15 in the link 
looks simple (after executing the current reset the core thread size if not 
enough for current requests).  There is a new answer on the end... with a GPL 
soln.

> use a specific ThreadPoolExecutor
> -
>
> Key: HBASE-11590
> URL: https://issues.apache.org/jira/browse/HBASE-11590
> Project: HBase
>  Issue Type: Bug
>  Components: Client, Performance
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: tp.patch
>
>
> The JDK TPE creates all the threads in the pool. As a consequence, we create 
> (by default) 256 threads even if we just need a few.
> The attached TPE create threads only if we have something in the queue.
> On a PE test with replica on, it improved the 99 latency percentile by 5%. 
> Warning: there are likely some race conditions, but I'm posting it here 
> because there is may be an implementation available somewhere we can use, or 
> a good reason not to do that. So feedback welcome as usual. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14404) Backport HBASE-14098 (Allow dropping caches behind compactions) to 0.98

2015-09-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791588#comment-14791588
 ] 

Lars Hofhansl commented on HBASE-14404:
---

I think it's fine either way. +1 on backport.

> Backport HBASE-14098 (Allow dropping caches behind compactions) to 0.98
> ---
>
> Key: HBASE-14404
> URL: https://issues.apache.org/jira/browse/HBASE-14404
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 0.98.15
>
> Attachments: HBASE-14404-0.98.patch
>
>
> HBASE-14098 adds a new configuration toggle - 
> "hbase.hfile.drop.behind.compaction" - which if set to "true" tells 
> compactions to drop pages from the OS blockcache after write.  It's on by 
> default where committed so far but a backport to 0.98 would default it to 
> off. (The backport would also retain compat methods to LimitedPrivate 
> interface StoreFileScanner.) What could make it a controversial change in 
> 0.98 is it changes the default setting of 
> 'hbase.regionserver.compaction.private.readers' from "false" to "true".  I 
> think it's fine, we use private readers in production. They're stable and do 
> not present perf issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14404) Backport HBASE-14098 (Allow dropping caches behind compactions) to 0.98

2015-09-16 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791577#comment-14791577
 ] 

Andrew Purtell commented on HBASE-14404:


... which seems fine, but we could change the backport to pick up the default 
setting from the hadoop config if the HBase configuration doesn't specify one 
way or the other. 

> Backport HBASE-14098 (Allow dropping caches behind compactions) to 0.98
> ---
>
> Key: HBASE-14404
> URL: https://issues.apache.org/jira/browse/HBASE-14404
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 0.98.15
>
> Attachments: HBASE-14404-0.98.patch
>
>
> HBASE-14098 adds a new configuration toggle - 
> "hbase.hfile.drop.behind.compaction" - which if set to "true" tells 
> compactions to drop pages from the OS blockcache after write.  It's on by 
> default where committed so far but a backport to 0.98 would default it to 
> off. (The backport would also retain compat methods to LimitedPrivate 
> interface StoreFileScanner.) What could make it a controversial change in 
> 0.98 is it changes the default setting of 
> 'hbase.regionserver.compaction.private.readers' from "false" to "true".  I 
> think it's fine, we use private readers in production. They're stable and do 
> not present perf issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14404) Backport HBASE-14098 (Allow dropping caches behind compactions) to 0.98

2015-09-16 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791575#comment-14791575
 ] 

Andrew Purtell commented on HBASE-14404:


HBase is changing the setting for the embedded DFSclient in HBase, 

> Backport HBASE-14098 (Allow dropping caches behind compactions) to 0.98
> ---
>
> Key: HBASE-14404
> URL: https://issues.apache.org/jira/browse/HBASE-14404
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 0.98.15
>
> Attachments: HBASE-14404-0.98.patch
>
>
> HBASE-14098 adds a new configuration toggle - 
> "hbase.hfile.drop.behind.compaction" - which if set to "true" tells 
> compactions to drop pages from the OS blockcache after write.  It's on by 
> default where committed so far but a backport to 0.98 would default it to 
> off. (The backport would also retain compat methods to LimitedPrivate 
> interface StoreFileScanner.) What could make it a controversial change in 
> 0.98 is it changes the default setting of 
> 'hbase.regionserver.compaction.private.readers' from "false" to "true".  I 
> think it's fine, we use private readers in production. They're stable and do 
> not present perf issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names

2015-09-16 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791565#comment-14791565
 ] 

Lei Chen commented on HBASE-14082:
--

Thank you all for helping me all the way.

> Add replica id to JMX metrics names
> ---
>
> Key: HBASE-14082
> URL: https://issues.apache.org/jira/browse/HBASE-14082
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Lei Chen
>Assignee: Lei Chen
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14082-v6.patch, HBASE-14082-v1.patch, 
> HBASE-14082-v2.patch, HBASE-14082-v3.patch, HBASE-14082-v4.patch, 
> HBASE-14082-v5.patch
>
>
> Today, via JMX, one cannot distinguish a primary region from a replica. A 
> possible solution is to add replica id to JMX metrics names. The benefits may 
> include, for example:
> # Knowing the latency of a read request on a replica region means the first 
> attempt to the primary region has timeout.
> # Write requests on replicas are due to the replication process, while the 
> ones on primary are from clients.
> # In case of looking for hot spots of read operations, replicas should be 
> excluded since TIMELINE reads are sent to all replicas.
> To implement, we can change the format of metrics names found at 
> {code}Hadoop->HBase->RegionServer->Regions->Attributes{code}
> from 
> {code}namespace__table__region__metric_{code}
> to
> {code}namespace__table__region__replicaid__metric_{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791563#comment-14791563
 ] 

Hudson commented on HBASE-14274:


FAILURE: Integrated in HBase-1.2 #180 (See 
[https://builds.apache.org/job/HBase-1.2/180/])
HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev 
a229ac91fbab2608ae89bbe44b1dd05e5aef1183)
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java
* 
hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java


> Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs 
> MetricsRegionAggregateSourceImpl
> ---
>
> Key: HBASE-14274
> URL: https://issues.apache.org/jira/browse/HBASE-14274
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: stack
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, 
> HBASE-14274.patch
>
>
> Looking into parent issue, got a hang locally of TestDistributedLogReplay.
> We have region closes here:
> {code}
> "RS_CLOSE_META-localhost:59610-0" prio=5 tid=0x7ff65c03f800 nid=0x54347 
> waiting on condition [0x00011f7ac000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00075636d8c0> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78)
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120)
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500)
>   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344)
>   - locked <0x0007ff878190> (a java.lang.Object)
>   at 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102)
>   at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> {code}
> They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to 
> get a write lock on this classes local ReentrantReadWriteLock while holding 
> MetricsRegionSourceImpl's readWriteLock write lock.
> Then, elsewhere the JmxCacheBuster is running trying to get metrics with 
> above locks held in reverse:
> {code}
> "HBase-Metrics2-1" daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting 
> on condition [0x000140ea5000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007cade1480> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193)
>   at 
> org.apache.hadoop.hbase.re

[jira] [Commented] (HBASE-14278) Fix NPE that is showing up since HBASE-14274 went in

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791562#comment-14791562
 ] 

Hudson commented on HBASE-14278:


FAILURE: Integrated in HBase-1.2 #180 (See 
[https://builds.apache.org/job/HBase-1.2/180/])
HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev 
a229ac91fbab2608ae89bbe44b1dd05e5aef1183)
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java
* 
hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java


> Fix NPE that is showing up since HBASE-14274 went in
> 
>
> Key: HBASE-14278
> URL: https://issues.apache.org/jira/browse/HBASE-14278
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: stack
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14278-v1.patch, HBASE-14278-v2.patch, 
> HBASE-14278-v3.patch, HBASE-14278-v4.patch, HBASE-14278-v5.patch, 
> HBASE-14278.patch
>
>
> Saw this in TestDistributedLogSplitting after HBASE-14274 was applied.
> {code}
> 119113 2015-08-20 15:31:10,704 WARN  [HBase-Metrics2-1] 
> impl.MetricsConfig(124): Cannot locate configuration: tried 
> hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
> 119114 2015-08-20 15:31:10,710 ERROR [HBase-Metrics2-1] 
> lib.MethodMetric$2(118): Error invoking method getBlocksTotal
> 119115 java.lang.reflect.InvocationTargetException
> 119116 ›   at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source)
> 119117 ›   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 119118 ›   at java.lang.reflect.Method.invoke(Method.java:606)
> 119119 ›   at 
> org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111)
> 119120 ›   at 
> org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144)
> 119121 ›   at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:387)
> 119122 ›   at 
> org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79)
> 119123 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195)
> 119124 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172)
> 119125 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151)
> 119126 ›   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333)
> 119127 ›   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319)
> 119128 ›   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
> 119129 ›   at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57)
> 119130 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:221)
> 119131 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:96)
> 119132 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:245)
> 119133 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:229)
> 119134 ›   at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source)
> 119135 ›   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 119136 ›   at java.lang.reflect.Method.invoke(Method.java:606)
> 119137 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:290)
> 119138 ›   at com.sun.proxy.$Proxy13.postStart(Unknown Source)
> 119139 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:185)
> 119140 ›   at 
> org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:81)
> 119141 ›   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 119142 ›   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 119143 ›   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> 119144 ›   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:29

[jira] [Commented] (HBASE-14334) Move Memcached block cache in to it's own optional module.

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791564#comment-14791564
 ] 

Hudson commented on HBASE-14334:


FAILURE: Integrated in HBase-1.2 #180 (See 
[https://builds.apache.org/job/HBase-1.2/180/])
HBASE-14334 Move Memcached block cache in to it's own optional module. (eclark: 
rev 20f272cb7fdb87598f3e995467853c3770faab55)
* pom.xml
* hbase-server/pom.xml
* hbase-assembly/src/main/assembly/hadoop-two-compat.xml
* 
hbase-external-blockcache/src/main/java/org/apache/hadoop/hbase/io/hfile/MemcachedBlockCache.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
* hbase-external-blockcache/pom.xml
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/MemcachedBlockCache.java
* hbase-assembly/pom.xml


> Move Memcached block cache in to it's own optional module.
> --
>
> Key: HBASE-14334
> URL: https://issues.apache.org/jira/browse/HBASE-14334
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0
>
> Attachments: HBASE-14334-v1.patch, HBASE-14334.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14278) Fix NPE that is showing up since HBASE-14274 went in

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791554#comment-14791554
 ] 

Hudson commented on HBASE-14278:


FAILURE: Integrated in HBase-1.3 #182 (See 
[https://builds.apache.org/job/HBase-1.3/182/])
HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev 
2029e851827fa1bf59436c7baa1971b52ac5833e)
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java
* 
hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java


> Fix NPE that is showing up since HBASE-14274 went in
> 
>
> Key: HBASE-14278
> URL: https://issues.apache.org/jira/browse/HBASE-14278
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: stack
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14278-v1.patch, HBASE-14278-v2.patch, 
> HBASE-14278-v3.patch, HBASE-14278-v4.patch, HBASE-14278-v5.patch, 
> HBASE-14278.patch
>
>
> Saw this in TestDistributedLogSplitting after HBASE-14274 was applied.
> {code}
> 119113 2015-08-20 15:31:10,704 WARN  [HBase-Metrics2-1] 
> impl.MetricsConfig(124): Cannot locate configuration: tried 
> hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
> 119114 2015-08-20 15:31:10,710 ERROR [HBase-Metrics2-1] 
> lib.MethodMetric$2(118): Error invoking method getBlocksTotal
> 119115 java.lang.reflect.InvocationTargetException
> 119116 ›   at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source)
> 119117 ›   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 119118 ›   at java.lang.reflect.Method.invoke(Method.java:606)
> 119119 ›   at 
> org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111)
> 119120 ›   at 
> org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144)
> 119121 ›   at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:387)
> 119122 ›   at 
> org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79)
> 119123 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195)
> 119124 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172)
> 119125 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151)
> 119126 ›   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333)
> 119127 ›   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319)
> 119128 ›   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
> 119129 ›   at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57)
> 119130 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:221)
> 119131 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:96)
> 119132 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:245)
> 119133 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:229)
> 119134 ›   at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source)
> 119135 ›   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 119136 ›   at java.lang.reflect.Method.invoke(Method.java:606)
> 119137 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:290)
> 119138 ›   at com.sun.proxy.$Proxy13.postStart(Unknown Source)
> 119139 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:185)
> 119140 ›   at 
> org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:81)
> 119141 ›   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 119142 ›   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 119143 ›   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> 119144 ›   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:29

[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791556#comment-14791556
 ] 

Hudson commented on HBASE-14274:


FAILURE: Integrated in HBase-1.3 #182 (See 
[https://builds.apache.org/job/HBase-1.3/182/])
HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev 
2029e851827fa1bf59436c7baa1971b52ac5833e)
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java
* 
hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java


> Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs 
> MetricsRegionAggregateSourceImpl
> ---
>
> Key: HBASE-14274
> URL: https://issues.apache.org/jira/browse/HBASE-14274
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: stack
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, 
> HBASE-14274.patch
>
>
> Looking into parent issue, got a hang locally of TestDistributedLogReplay.
> We have region closes here:
> {code}
> "RS_CLOSE_META-localhost:59610-0" prio=5 tid=0x7ff65c03f800 nid=0x54347 
> waiting on condition [0x00011f7ac000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00075636d8c0> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78)
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120)
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500)
>   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344)
>   - locked <0x0007ff878190> (a java.lang.Object)
>   at 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102)
>   at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> {code}
> They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to 
> get a write lock on this classes local ReentrantReadWriteLock while holding 
> MetricsRegionSourceImpl's readWriteLock write lock.
> Then, elsewhere the JmxCacheBuster is running trying to get metrics with 
> above locks held in reverse:
> {code}
> "HBase-Metrics2-1" daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting 
> on condition [0x000140ea5000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007cade1480> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193)
>   at 
> org.apache.hadoop.hbase.re

[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791555#comment-14791555
 ] 

Hudson commented on HBASE-14082:


FAILURE: Integrated in HBase-1.3 #182 (See 
[https://builds.apache.org/job/HBase-1.3/182/])
HBASE-14082 Add replica id to JMX metrics names (Lei Chen) (enis: rev 
bb4a690b79a2485d24aa84b9635b7fea0ff6b0d4)
* 
hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSource.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegion.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/MetricsRegionWrapperStub.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java
* 
hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java
* 
hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionWrapper.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionWrapperImpl.java


> Add replica id to JMX metrics names
> ---
>
> Key: HBASE-14082
> URL: https://issues.apache.org/jira/browse/HBASE-14082
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Lei Chen
>Assignee: Lei Chen
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14082-v6.patch, HBASE-14082-v1.patch, 
> HBASE-14082-v2.patch, HBASE-14082-v3.patch, HBASE-14082-v4.patch, 
> HBASE-14082-v5.patch
>
>
> Today, via JMX, one cannot distinguish a primary region from a replica. A 
> possible solution is to add replica id to JMX metrics names. The benefits may 
> include, for example:
> # Knowing the latency of a read request on a replica region means the first 
> attempt to the primary region has timeout.
> # Write requests on replicas are due to the replication process, while the 
> ones on primary are from clients.
> # In case of looking for hot spots of read operations, replicas should be 
> excluded since TIMELINE reads are sent to all replicas.
> To implement, we can change the format of metrics names found at 
> {code}Hadoop->HBase->RegionServer->Regions->Attributes{code}
> from 
> {code}namespace__table__region__metric_{code}
> to
> {code}namespace__table__region__replicaid__metric_{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14334) Move Memcached block cache in to it's own optional module.

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791557#comment-14791557
 ] 

Hudson commented on HBASE-14334:


FAILURE: Integrated in HBase-1.3 #182 (See 
[https://builds.apache.org/job/HBase-1.3/182/])
HBASE-14334 Move Memcached block cache in to it's own optional module. (eclark: 
rev d4d398d9420506b00562c180259501bf2f5401be)
* hbase-server/pom.xml
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
* hbase-external-blockcache/pom.xml
* hbase-assembly/src/main/assembly/hadoop-two-compat.xml
* pom.xml
* hbase-assembly/pom.xml
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/MemcachedBlockCache.java
* 
hbase-external-blockcache/src/main/java/org/apache/hadoop/hbase/io/hfile/MemcachedBlockCache.java


> Move Memcached block cache in to it's own optional module.
> --
>
> Key: HBASE-14334
> URL: https://issues.apache.org/jira/browse/HBASE-14334
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0
>
> Attachments: HBASE-14334-v1.patch, HBASE-14334.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14447) Spark tests failing: bind exception when putting up info server

2015-09-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14447:
--
Status: Patch Available  (was: Open)

> Spark tests failing: bind exception when putting up info server
> ---
>
> Key: HBASE-14447
> URL: https://issues.apache.org/jira/browse/HBASE-14447
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Attachments: 14447.patch
>
>
> Go tthis:
> {code}
> Running org.apache.hadoop.hbase.spark.TestJavaHBaseContext
> Tests run: 8, Failures: 0, Errors: 8, Skipped: 0, Time elapsed: 540.875 sec 
> <<< FAILURE! - in org.apache.hadoop.hbase.spark.TestJavaHBaseContext
> testBulkDelete(org.apache.hadoop.hbase.spark.TestJavaHBaseContext)  Time 
> elapsed: 540.647 sec  <<< ERROR!
> java.lang.RuntimeException: java.io.IOException: Shutting down
> at sun.nio.ch.Net.bind0(Native Method)
> at sun.nio.ch.Net.bind(Net.java:444)
> at sun.nio.ch.Net.bind(Net.java:436)
> at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> at 
> org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
> at 
> org.apache.hadoop.hbase.http.HttpServer.openListeners(HttpServer.java:1012)
> at org.apache.hadoop.hbase.http.HttpServer.start(HttpServer.java:953)
> at org.apache.hadoop.hbase.http.InfoServer.start(InfoServer.java:91)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.putUpWebUI(HRegionServer.java:1788)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:603)
> at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:367)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:139)
> at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:218)
> at 
> org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:154)
> at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:214)
> at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
> at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1075)
> at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1041)
> at 
> org.apache.hadoop.hbase.spark.TestJavaHBaseContext.setUp(TestJavaHBaseContext.java:82)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14447) Spark tests failing: bind exception when putting up info server

2015-09-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14447:
--
Attachment: 14447.patch

Same as HBASE-14435

> Spark tests failing: bind exception when putting up info server
> ---
>
> Key: HBASE-14447
> URL: https://issues.apache.org/jira/browse/HBASE-14447
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Attachments: 14447.patch
>
>
> Go tthis:
> {code}
> Running org.apache.hadoop.hbase.spark.TestJavaHBaseContext
> Tests run: 8, Failures: 0, Errors: 8, Skipped: 0, Time elapsed: 540.875 sec 
> <<< FAILURE! - in org.apache.hadoop.hbase.spark.TestJavaHBaseContext
> testBulkDelete(org.apache.hadoop.hbase.spark.TestJavaHBaseContext)  Time 
> elapsed: 540.647 sec  <<< ERROR!
> java.lang.RuntimeException: java.io.IOException: Shutting down
> at sun.nio.ch.Net.bind0(Native Method)
> at sun.nio.ch.Net.bind(Net.java:444)
> at sun.nio.ch.Net.bind(Net.java:436)
> at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> at 
> org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
> at 
> org.apache.hadoop.hbase.http.HttpServer.openListeners(HttpServer.java:1012)
> at org.apache.hadoop.hbase.http.HttpServer.start(HttpServer.java:953)
> at org.apache.hadoop.hbase.http.InfoServer.start(InfoServer.java:91)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.putUpWebUI(HRegionServer.java:1788)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:603)
> at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:367)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:139)
> at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:218)
> at 
> org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:154)
> at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:214)
> at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
> at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1075)
> at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1041)
> at 
> org.apache.hadoop.hbase.spark.TestJavaHBaseContext.setUp(TestJavaHBaseContext.java:82)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14447) Spark tests failing: bind exception when putting up info server

2015-09-16 Thread stack (JIRA)
stack created HBASE-14447:
-

 Summary: Spark tests failing: bind exception when putting up info 
server
 Key: HBASE-14447
 URL: https://issues.apache.org/jira/browse/HBASE-14447
 Project: HBase
  Issue Type: Sub-task
Reporter: stack
Assignee: stack
Priority: Minor


Go tthis:

{code}
Running org.apache.hadoop.hbase.spark.TestJavaHBaseContext
Tests run: 8, Failures: 0, Errors: 8, Skipped: 0, Time elapsed: 540.875 sec <<< 
FAILURE! - in org.apache.hadoop.hbase.spark.TestJavaHBaseContext
testBulkDelete(org.apache.hadoop.hbase.spark.TestJavaHBaseContext)  Time 
elapsed: 540.647 sec  <<< ERROR!
java.lang.RuntimeException: java.io.IOException: Shutting down
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at 
org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
at 
org.apache.hadoop.hbase.http.HttpServer.openListeners(HttpServer.java:1012)
at org.apache.hadoop.hbase.http.HttpServer.start(HttpServer.java:953)
at org.apache.hadoop.hbase.http.InfoServer.start(InfoServer.java:91)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.putUpWebUI(HRegionServer.java:1788)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:603)
at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:367)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:139)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:218)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:154)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:214)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1075)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1041)
at 
org.apache.hadoop.hbase.spark.TestJavaHBaseContext.setUp(TestJavaHBaseContext.java:82)
...
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14404) Backport HBASE-14098 (Allow dropping caches behind compactions) to 0.98

2015-09-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791536#comment-14791536
 ] 

Lars Hofhansl commented on HBASE-14404:
---

Looking at the patch detail now. It does not allow for sticking with the 
default setup for HDFS. Whatever the HBase setting is will override whatever 
was set globally for HDFS, that might be surprising.

> Backport HBASE-14098 (Allow dropping caches behind compactions) to 0.98
> ---
>
> Key: HBASE-14404
> URL: https://issues.apache.org/jira/browse/HBASE-14404
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 0.98.15
>
> Attachments: HBASE-14404-0.98.patch
>
>
> HBASE-14098 adds a new configuration toggle - 
> "hbase.hfile.drop.behind.compaction" - which if set to "true" tells 
> compactions to drop pages from the OS blockcache after write.  It's on by 
> default where committed so far but a backport to 0.98 would default it to 
> off. (The backport would also retain compat methods to LimitedPrivate 
> interface StoreFileScanner.) What could make it a controversial change in 
> 0.98 is it changes the default setting of 
> 'hbase.regionserver.compaction.private.readers' from "false" to "true".  I 
> think it's fine, we use private readers in production. They're stable and do 
> not present perf issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13250) chown of ExportSnapshot does not cover all path and files

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791515#comment-14791515
 ] 

Hudson commented on HBASE-13250:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1078 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1078/])
HBASE-13250 Revert due to compilation error against hadoop-1 profile (tedyu: 
rev 38995fbd51ac4735b673dd1527cb2631b69b7474)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java


> chown of ExportSnapshot does not cover all path and files
> -
>
> Key: HBASE-13250
> URL: https://issues.apache.org/jira/browse/HBASE-13250
> Project: HBase
>  Issue Type: Bug
>Reporter: He Liangliang
>Assignee: He Liangliang
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: 13250-0.98-v2.txt, HBASE-13250-V0.patch
>
>
> The chuser/chgroup function only covers the leaf hfile. The ownership of 
> hfile parent paths and snapshot reference files are not changed as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14334) Move Memcached block cache in to it's own optional module.

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791507#comment-14791507
 ] 

Hudson commented on HBASE-14334:


FAILURE: Integrated in HBase-TRUNK #6816 (See 
[https://builds.apache.org/job/HBase-TRUNK/6816/])
HBASE-14334 Move Memcached block cache in to it's own optional module. (eclark: 
rev 7b08f4c8be60582cd02ba31161be214c9c9d40f9)
* pom.xml
* hbase-server/pom.xml
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/MemcachedBlockCache.java
* hbase-assembly/src/main/assembly/hadoop-two-compat.xml
* 
hbase-external-blockcache/src/main/java/org/apache/hadoop/hbase/io/hfile/MemcachedBlockCache.java
* hbase-assembly/pom.xml
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
* hbase-external-blockcache/pom.xml


> Move Memcached block cache in to it's own optional module.
> --
>
> Key: HBASE-14334
> URL: https://issues.apache.org/jira/browse/HBASE-14334
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0
>
> Attachments: HBASE-14334-v1.patch, HBASE-14334.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14082) Add replica id to JMX metrics names

2015-09-16 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-14082:
--
   Resolution: Fixed
Fix Version/s: 1.3.0
   1.2.0
   Status: Resolved  (was: Patch Available)

I have committed this to 1.2+. Thanks Lei for the patch. 

> Add replica id to JMX metrics names
> ---
>
> Key: HBASE-14082
> URL: https://issues.apache.org/jira/browse/HBASE-14082
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Lei Chen
>Assignee: Lei Chen
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14082-v6.patch, HBASE-14082-v1.patch, 
> HBASE-14082-v2.patch, HBASE-14082-v3.patch, HBASE-14082-v4.patch, 
> HBASE-14082-v5.patch
>
>
> Today, via JMX, one cannot distinguish a primary region from a replica. A 
> possible solution is to add replica id to JMX metrics names. The benefits may 
> include, for example:
> # Knowing the latency of a read request on a replica region means the first 
> attempt to the primary region has timeout.
> # Write requests on replicas are due to the replication process, while the 
> ones on primary are from clients.
> # In case of looking for hot spots of read operations, replicas should be 
> excluded since TIMELINE reads are sent to all replicas.
> To implement, we can change the format of metrics names found at 
> {code}Hadoop->HBase->RegionServer->Regions->Attributes{code}
> from 
> {code}namespace__table__region__metric_{code}
> to
> {code}namespace__table__region__replicaid__metric_{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14334) Move Memcached block cache in to it's own optional module.

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791455#comment-14791455
 ] 

Hudson commented on HBASE-14334:


FAILURE: Integrated in HBase-1.2-IT #152 (See 
[https://builds.apache.org/job/HBase-1.2-IT/152/])
HBASE-14334 Move Memcached block cache in to it's own optional module. (eclark: 
rev 20f272cb7fdb87598f3e995467853c3770faab55)
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/MemcachedBlockCache.java
* pom.xml
* hbase-external-blockcache/pom.xml
* hbase-assembly/pom.xml
* hbase-assembly/src/main/assembly/hadoop-two-compat.xml
* 
hbase-external-blockcache/src/main/java/org/apache/hadoop/hbase/io/hfile/MemcachedBlockCache.java
* hbase-server/pom.xml


> Move Memcached block cache in to it's own optional module.
> --
>
> Key: HBASE-14334
> URL: https://issues.apache.org/jira/browse/HBASE-14334
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0
>
> Attachments: HBASE-14334-v1.patch, HBASE-14334.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791454#comment-14791454
 ] 

Hudson commented on HBASE-14274:


FAILURE: Integrated in HBase-1.2-IT #152 (See 
[https://builds.apache.org/job/HBase-1.2-IT/152/])
HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev 
a229ac91fbab2608ae89bbe44b1dd05e5aef1183)
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java
* 
hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java


> Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs 
> MetricsRegionAggregateSourceImpl
> ---
>
> Key: HBASE-14274
> URL: https://issues.apache.org/jira/browse/HBASE-14274
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: stack
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, 
> HBASE-14274.patch
>
>
> Looking into parent issue, got a hang locally of TestDistributedLogReplay.
> We have region closes here:
> {code}
> "RS_CLOSE_META-localhost:59610-0" prio=5 tid=0x7ff65c03f800 nid=0x54347 
> waiting on condition [0x00011f7ac000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00075636d8c0> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78)
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120)
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500)
>   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344)
>   - locked <0x0007ff878190> (a java.lang.Object)
>   at 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102)
>   at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> {code}
> They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to 
> get a write lock on this classes local ReentrantReadWriteLock while holding 
> MetricsRegionSourceImpl's readWriteLock write lock.
> Then, elsewhere the JmxCacheBuster is running trying to get metrics with 
> above locks held in reverse:
> {code}
> "HBase-Metrics2-1" daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting 
> on condition [0x000140ea5000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007cade1480> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193)
>   at 
> org.apache.hadoop.hb

[jira] [Commented] (HBASE-14278) Fix NPE that is showing up since HBASE-14274 went in

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791453#comment-14791453
 ] 

Hudson commented on HBASE-14278:


FAILURE: Integrated in HBase-1.2-IT #152 (See 
[https://builds.apache.org/job/HBase-1.2-IT/152/])
HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev 
a229ac91fbab2608ae89bbe44b1dd05e5aef1183)
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java
* 
hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java


> Fix NPE that is showing up since HBASE-14274 went in
> 
>
> Key: HBASE-14278
> URL: https://issues.apache.org/jira/browse/HBASE-14278
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: stack
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14278-v1.patch, HBASE-14278-v2.patch, 
> HBASE-14278-v3.patch, HBASE-14278-v4.patch, HBASE-14278-v5.patch, 
> HBASE-14278.patch
>
>
> Saw this in TestDistributedLogSplitting after HBASE-14274 was applied.
> {code}
> 119113 2015-08-20 15:31:10,704 WARN  [HBase-Metrics2-1] 
> impl.MetricsConfig(124): Cannot locate configuration: tried 
> hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
> 119114 2015-08-20 15:31:10,710 ERROR [HBase-Metrics2-1] 
> lib.MethodMetric$2(118): Error invoking method getBlocksTotal
> 119115 java.lang.reflect.InvocationTargetException
> 119116 ›   at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source)
> 119117 ›   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 119118 ›   at java.lang.reflect.Method.invoke(Method.java:606)
> 119119 ›   at 
> org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111)
> 119120 ›   at 
> org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144)
> 119121 ›   at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:387)
> 119122 ›   at 
> org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79)
> 119123 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195)
> 119124 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172)
> 119125 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151)
> 119126 ›   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333)
> 119127 ›   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319)
> 119128 ›   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
> 119129 ›   at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57)
> 119130 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:221)
> 119131 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:96)
> 119132 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:245)
> 119133 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:229)
> 119134 ›   at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source)
> 119135 ›   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 119136 ›   at java.lang.reflect.Method.invoke(Method.java:606)
> 119137 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:290)
> 119138 ›   at com.sun.proxy.$Proxy13.postStart(Unknown Source)
> 119139 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:185)
> 119140 ›   at 
> org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:81)
> 119141 ›   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 119142 ›   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 119143 ›   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> 119144 ›   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.j

[jira] [Commented] (HBASE-14278) Fix NPE that is showing up since HBASE-14274 went in

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791448#comment-14791448
 ] 

Hudson commented on HBASE-14278:


SUCCESS: Integrated in HBase-1.3-IT #162 (See 
[https://builds.apache.org/job/HBase-1.3-IT/162/])
HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev 
2029e851827fa1bf59436c7baa1971b52ac5833e)
* 
hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java


> Fix NPE that is showing up since HBASE-14274 went in
> 
>
> Key: HBASE-14278
> URL: https://issues.apache.org/jira/browse/HBASE-14278
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: stack
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14278-v1.patch, HBASE-14278-v2.patch, 
> HBASE-14278-v3.patch, HBASE-14278-v4.patch, HBASE-14278-v5.patch, 
> HBASE-14278.patch
>
>
> Saw this in TestDistributedLogSplitting after HBASE-14274 was applied.
> {code}
> 119113 2015-08-20 15:31:10,704 WARN  [HBase-Metrics2-1] 
> impl.MetricsConfig(124): Cannot locate configuration: tried 
> hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
> 119114 2015-08-20 15:31:10,710 ERROR [HBase-Metrics2-1] 
> lib.MethodMetric$2(118): Error invoking method getBlocksTotal
> 119115 java.lang.reflect.InvocationTargetException
> 119116 ›   at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source)
> 119117 ›   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 119118 ›   at java.lang.reflect.Method.invoke(Method.java:606)
> 119119 ›   at 
> org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111)
> 119120 ›   at 
> org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144)
> 119121 ›   at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:387)
> 119122 ›   at 
> org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79)
> 119123 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195)
> 119124 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172)
> 119125 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151)
> 119126 ›   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333)
> 119127 ›   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319)
> 119128 ›   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
> 119129 ›   at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57)
> 119130 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:221)
> 119131 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:96)
> 119132 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:245)
> 119133 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:229)
> 119134 ›   at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source)
> 119135 ›   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 119136 ›   at java.lang.reflect.Method.invoke(Method.java:606)
> 119137 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:290)
> 119138 ›   at com.sun.proxy.$Proxy13.postStart(Unknown Source)
> 119139 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:185)
> 119140 ›   at 
> org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:81)
> 119141 ›   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 119142 ›   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 119143 ›   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> 119144 ›   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.j

[jira] [Commented] (HBASE-14334) Move Memcached block cache in to it's own optional module.

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791450#comment-14791450
 ] 

Hudson commented on HBASE-14334:


SUCCESS: Integrated in HBase-1.3-IT #162 (See 
[https://builds.apache.org/job/HBase-1.3-IT/162/])
HBASE-14334 Move Memcached block cache in to it's own optional module. (eclark: 
rev d4d398d9420506b00562c180259501bf2f5401be)
* hbase-assembly/src/main/assembly/hadoop-two-compat.xml
* hbase-external-blockcache/pom.xml
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/MemcachedBlockCache.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
* hbase-server/pom.xml
* hbase-assembly/pom.xml
* pom.xml
* 
hbase-external-blockcache/src/main/java/org/apache/hadoop/hbase/io/hfile/MemcachedBlockCache.java


> Move Memcached block cache in to it's own optional module.
> --
>
> Key: HBASE-14334
> URL: https://issues.apache.org/jira/browse/HBASE-14334
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0
>
> Attachments: HBASE-14334-v1.patch, HBASE-14334.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791449#comment-14791449
 ] 

Hudson commented on HBASE-14274:


SUCCESS: Integrated in HBase-1.3-IT #162 (See 
[https://builds.apache.org/job/HBase-1.3-IT/162/])
HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev 
2029e851827fa1bf59436c7baa1971b52ac5833e)
* 
hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java


> Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs 
> MetricsRegionAggregateSourceImpl
> ---
>
> Key: HBASE-14274
> URL: https://issues.apache.org/jira/browse/HBASE-14274
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: stack
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, 
> HBASE-14274.patch
>
>
> Looking into parent issue, got a hang locally of TestDistributedLogReplay.
> We have region closes here:
> {code}
> "RS_CLOSE_META-localhost:59610-0" prio=5 tid=0x7ff65c03f800 nid=0x54347 
> waiting on condition [0x00011f7ac000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00075636d8c0> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78)
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120)
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500)
>   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344)
>   - locked <0x0007ff878190> (a java.lang.Object)
>   at 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102)
>   at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> {code}
> They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to 
> get a write lock on this classes local ReentrantReadWriteLock while holding 
> MetricsRegionSourceImpl's readWriteLock write lock.
> Then, elsewhere the JmxCacheBuster is running trying to get metrics with 
> above locks held in reverse:
> {code}
> "HBase-Metrics2-1" daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting 
> on condition [0x000140ea5000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007cade1480> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193)
>   at 
> org.apache.hadoop.hb

[jira] [Reopened] (HBASE-14275) Backport to 0.98 HBASE-10785 Metas own location should be cached

2015-09-16 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell reopened HBASE-14275:


I'm seeing instability in TestAssignmentManagerOnCluster and 
TestZKLessAMOnCluster and a bisect lead back to this change. Let me repeat the 
bisect and update shortly.

> Backport to 0.98 HBASE-10785 Metas own location should be cached
> 
>
> Key: HBASE-14275
> URL: https://issues.apache.org/jira/browse/HBASE-14275
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jerry He
>Assignee: Jerry He
> Fix For: 0.98.14
>
> Attachments: HBASE-14275-0.98.patch
>
>
> We've seen similar problem reported on 0.98.
> It is good improvement to have.
> This will cover HBASE-10785 and the a later HBASE-11332.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13250) chown of ExportSnapshot does not cover all path and files

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791393#comment-14791393
 ] 

Hudson commented on HBASE-13250:


FAILURE: Integrated in HBase-1.2 #179 (See 
[https://builds.apache.org/job/HBase-1.2/179/])
HBASE-13250 chown of ExportSnapshot does not cover all path and files (He 
Liangliang) (tedyu: rev b243c898e72d835a731d893c853c958072d42038)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java


> chown of ExportSnapshot does not cover all path and files
> -
>
> Key: HBASE-13250
> URL: https://issues.apache.org/jira/browse/HBASE-13250
> Project: HBase
>  Issue Type: Bug
>Reporter: He Liangliang
>Assignee: He Liangliang
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: 13250-0.98-v2.txt, HBASE-13250-V0.patch
>
>
> The chuser/chgroup function only covers the leaf hfile. The ownership of 
> hfile parent paths and snapshot reference files are not changed as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13250) chown of ExportSnapshot does not cover all path and files

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791384#comment-14791384
 ] 

Hudson commented on HBASE-13250:


FAILURE: Integrated in HBase-0.98 #1124 (See 
[https://builds.apache.org/job/HBase-0.98/1124/])
HBASE-13250 chown of ExportSnapshot does not cover all path and files (He 
Liangliang) (tedyu: rev bcd986e47b8d633c996c8a2040c2a40b32cb5c59)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java


> chown of ExportSnapshot does not cover all path and files
> -
>
> Key: HBASE-13250
> URL: https://issues.apache.org/jira/browse/HBASE-13250
> Project: HBase
>  Issue Type: Bug
>Reporter: He Liangliang
>Assignee: He Liangliang
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: 13250-0.98-v2.txt, HBASE-13250-V0.patch
>
>
> The chuser/chgroup function only covers the leaf hfile. The ownership of 
> hfile parent paths and snapshot reference files are not changed as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13250) chown of ExportSnapshot does not cover all path and files

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791377#comment-14791377
 ] 

Hudson commented on HBASE-13250:


FAILURE: Integrated in HBase-1.3 #181 (See 
[https://builds.apache.org/job/HBase-1.3/181/])
HBASE-13250 chown of ExportSnapshot does not cover all path and files (He 
Liangliang) (tedyu: rev 6598f18e564bf06348e99863548546f092808c35)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java


> chown of ExportSnapshot does not cover all path and files
> -
>
> Key: HBASE-13250
> URL: https://issues.apache.org/jira/browse/HBASE-13250
> Project: HBase
>  Issue Type: Bug
>Reporter: He Liangliang
>Assignee: He Liangliang
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: 13250-0.98-v2.txt, HBASE-13250-V0.patch
>
>
> The chuser/chgroup function only covers the leaf hfile. The ownership of 
> hfile parent paths and snapshot reference files are not changed as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14278) Fix NPE that is showing up since HBASE-14274 went in

2015-09-16 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-14278:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Fix NPE that is showing up since HBASE-14274 went in
> 
>
> Key: HBASE-14278
> URL: https://issues.apache.org/jira/browse/HBASE-14278
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: stack
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14278-v1.patch, HBASE-14278-v2.patch, 
> HBASE-14278-v3.patch, HBASE-14278-v4.patch, HBASE-14278-v5.patch, 
> HBASE-14278.patch
>
>
> Saw this in TestDistributedLogSplitting after HBASE-14274 was applied.
> {code}
> 119113 2015-08-20 15:31:10,704 WARN  [HBase-Metrics2-1] 
> impl.MetricsConfig(124): Cannot locate configuration: tried 
> hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
> 119114 2015-08-20 15:31:10,710 ERROR [HBase-Metrics2-1] 
> lib.MethodMetric$2(118): Error invoking method getBlocksTotal
> 119115 java.lang.reflect.InvocationTargetException
> 119116 ›   at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source)
> 119117 ›   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 119118 ›   at java.lang.reflect.Method.invoke(Method.java:606)
> 119119 ›   at 
> org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111)
> 119120 ›   at 
> org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144)
> 119121 ›   at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:387)
> 119122 ›   at 
> org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79)
> 119123 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195)
> 119124 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172)
> 119125 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151)
> 119126 ›   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333)
> 119127 ›   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319)
> 119128 ›   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
> 119129 ›   at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57)
> 119130 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:221)
> 119131 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:96)
> 119132 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:245)
> 119133 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:229)
> 119134 ›   at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source)
> 119135 ›   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 119136 ›   at java.lang.reflect.Method.invoke(Method.java:606)
> 119137 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:290)
> 119138 ›   at com.sun.proxy.$Proxy13.postStart(Unknown Source)
> 119139 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:185)
> 119140 ›   at 
> org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:81)
> 119141 ›   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 119142 ›   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 119143 ›   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> 119144 ›   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> 119145 ›   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 119146 ›   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 119147 ›   at java.lang.Thread.run(Thread.java:744)
> 119148 Caused by: java.lang.NullPointerException
> 119149 ›   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.size(BlocksMap.java:198)
> 119150 ›   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.getTotalBlocks(BlockManager.java:3158)
> 119151 ›   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlocksTotal(FSNamesystem.java:5652)
> 119152 ›   ... 32 more
> {code}



--
This message was sent by Atlassian JIRA

[jira] [Created] (HBASE-14446) Save table descriptors and region infos during incremental backup

2015-09-16 Thread Vladimir Rodionov (JIRA)
Vladimir Rodionov created HBASE-14446:
-

 Summary: Save table descriptors and region infos during 
incremental backup 
 Key: HBASE-14446
 URL: https://issues.apache.org/jira/browse/HBASE-14446
 Project: HBase
  Issue Type: Sub-task
Reporter: Vladimir Rodionov
Assignee: Vladimir Rodionov
 Fix For: 2.0.0


The current implementation of incremental backup just moves WAL files into 
backup directory.The restore procedure of incremental backup relies on full 
restore (from snapshot) as source of all table meta. 

Two problems: 

# Table configuration/properties may be changed after full backup and we will 
loose this info during restore
# We can not convert WAL files into HFiles w/o having table description and 
layout. Need for merge tool 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HBASE-14446) Save table descriptors and region infos during incremental backup

2015-09-16 Thread Vladimir Rodionov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-14446 started by Vladimir Rodionov.
-
> Save table descriptors and region infos during incremental backup 
> --
>
> Key: HBASE-14446
> URL: https://issues.apache.org/jira/browse/HBASE-14446
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
>
> The current implementation of incremental backup just moves WAL files into 
> backup directory.The restore procedure of incremental backup relies on full 
> restore (from snapshot) as source of all table meta. 
> Two problems: 
> # Table configuration/properties may be changed after full backup and we will 
> loose this info during restore
> # We can not convert WAL files into HFiles w/o having table description and 
> layout. Need for merge tool 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14352) Replication is terribly slow with WAL compression

2015-09-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791304#comment-14791304
 ] 

Lars Hofhansl commented on HBASE-14352:
---

That's a good point. If there's no advantage to compresses WALs ever, let's get 
rid of the code.

I think [~abhishek.chouhan] found write performance neutral with much reduced 
storage (20%). Only replication was significantly slower. Would certainly be 
nice if we could compress between DC when doing replication (but that's a 
different issue).

> Replication is terribly slow with WAL compression
> -
>
> Key: HBASE-14352
> URL: https://issues.apache.org/jira/browse/HBASE-14352
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.13
>Reporter: Abhishek Singh Chouhan
> Attachments: age_of_last_shipped.png, size_of_log_queue.png
>
>
> For the same load, replication with WAL compression enabled is almost 6x 
> slower than with compression turned off. Age of last shipped operation is 
> also correspondingly much higher when compression is turned on. 
> By observing Size of log queue we can see that it is taking too much time for 
> the queue to clear up.
> Attaching corresponding graphs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13250) chown of ExportSnapshot does not cover all path and files

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791295#comment-14791295
 ] 

Hudson commented on HBASE-13250:


FAILURE: Integrated in HBase-TRUNK #6815 (See 
[https://builds.apache.org/job/HBase-TRUNK/6815/])
HBASE-13250 chown of ExportSnapshot does not cover all path and files (He 
Liangliang) (tedyu: rev 08eabb89f60b821362efaba2701ddb9db5ff8b32)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java


> chown of ExportSnapshot does not cover all path and files
> -
>
> Key: HBASE-13250
> URL: https://issues.apache.org/jira/browse/HBASE-13250
> Project: HBase
>  Issue Type: Bug
>Reporter: He Liangliang
>Assignee: He Liangliang
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: 13250-0.98-v2.txt, HBASE-13250-V0.patch
>
>
> The chuser/chgroup function only covers the leaf hfile. The ownership of 
> hfile parent paths and snapshot reference files are not changed as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14334) Move Memcached block cache in to it's own optional module.

2015-09-16 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-14334:
--
  Resolution: Fixed
Release Note: 
Move external block cache to it's own module. This  will reduce dependencies 
for people who use hbase-server.
Currently Memcached is the reference implementation for external block cache. 
External block caches allow HBase to take advantage of other more complex 
caches that can live longer than the HBase regionserver process and are not 
necessarily tied to a single computer
life time. However external block caches add in extra operational overhead.
  Status: Resolved  (was: Patch Available)

> Move Memcached block cache in to it's own optional module.
> --
>
> Key: HBASE-14334
> URL: https://issues.apache.org/jira/browse/HBASE-14334
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0
>
> Attachments: HBASE-14334-v1.patch, HBASE-14334.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager

2015-09-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791253#comment-14791253
 ] 

stack commented on HBASE-10449:
---

Ok. Not what we want. Lets look at alternative...

> Wrong execution pool configuration in HConnectionManager
> 
>
> Key: HBASE-10449
> URL: https://issues.apache.org/jira/browse/HBASE-10449
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.98.0, 0.99.0, 0.96.1.1
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Critical
> Fix For: 0.98.0, 0.96.2, 0.99.0
>
> Attachments: HBASE-10449.v1.patch
>
>
> There is a confusion in the configuration of the pool. The attached patch 
> fixes this. This may change the client performances, as we were using a 
> single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-13250) chown of ExportSnapshot does not cover all path and files

2015-09-16 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-13250.

   Resolution: Fixed
Fix Version/s: 0.98.15

Patch 13250-0.98-v2.txt compiles with both hadoop-2 and hadoop-1 profiles.

> chown of ExportSnapshot does not cover all path and files
> -
>
> Key: HBASE-13250
> URL: https://issues.apache.org/jira/browse/HBASE-13250
> Project: HBase
>  Issue Type: Bug
>Reporter: He Liangliang
>Assignee: He Liangliang
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: 13250-0.98-v2.txt, HBASE-13250-V0.patch
>
>
> The chuser/chgroup function only covers the leaf hfile. The ownership of 
> hfile parent paths and snapshot reference files are not changed as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13250) chown of ExportSnapshot does not cover all path and files

2015-09-16 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-13250:
---
Attachment: 13250-0.98-v2.txt

> chown of ExportSnapshot does not cover all path and files
> -
>
> Key: HBASE-13250
> URL: https://issues.apache.org/jira/browse/HBASE-13250
> Project: HBase
>  Issue Type: Bug
>Reporter: He Liangliang
>Assignee: He Liangliang
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3
>
> Attachments: 13250-0.98-v2.txt, HBASE-13250-V0.patch
>
>
> The chuser/chgroup function only covers the leaf hfile. The ownership of 
> hfile parent paths and snapshot reference files are not changed as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14334) Move Memcached block cache in to it's own optional module.

2015-09-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791241#comment-14791241
 ] 

Hadoop QA commented on HBASE-14334:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12756333/HBASE-14334-v1.patch
  against master branch at commit bd26386dc7205c9b30b8488bc094bd380ec09adb.
  ATTACHMENT ID: 12756333

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+  
${project.build.directory}/test-classes/mrapp-generated-classpath
+  
${project.build.directory}/test-classes/mrapp-generated-classpath

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestReplicationShell

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15626//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15626//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15626//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15626//console

This message is automatically generated.

> Move Memcached block cache in to it's own optional module.
> --
>
> Key: HBASE-14334
> URL: https://issues.apache.org/jira/browse/HBASE-14334
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0
>
> Attachments: HBASE-14334-v1.patch, HBASE-14334.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13250) chown of ExportSnapshot does not cover all path and files

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791204#comment-14791204
 ] 

Hudson commented on HBASE-13250:


FAILURE: Integrated in HBase-1.1 #665 (See 
[https://builds.apache.org/job/HBase-1.1/665/])
HBASE-13250 chown of ExportSnapshot does not cover all path and files (He 
Liangliang) (tedyu: rev a1f45c1c43dfda4b044f948d4de5089662aa306b)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java


> chown of ExportSnapshot does not cover all path and files
> -
>
> Key: HBASE-13250
> URL: https://issues.apache.org/jira/browse/HBASE-13250
> Project: HBase
>  Issue Type: Bug
>Reporter: He Liangliang
>Assignee: He Liangliang
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3
>
> Attachments: HBASE-13250-V0.patch
>
>
> The chuser/chgroup function only covers the leaf hfile. The ownership of 
> hfile parent paths and snapshot reference files are not changed as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13250) chown of ExportSnapshot does not cover all path and files

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791198#comment-14791198
 ] 

Hudson commented on HBASE-13250:


SUCCESS: Integrated in HBase-1.0 #1053 (See 
[https://builds.apache.org/job/HBase-1.0/1053/])
HBASE-13250 chown of ExportSnapshot does not cover all path and files (He 
Liangliang) (tedyu: rev e12b771560b94ee7843225af36f0857e6571a10a)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java


> chown of ExportSnapshot does not cover all path and files
> -
>
> Key: HBASE-13250
> URL: https://issues.apache.org/jira/browse/HBASE-13250
> Project: HBase
>  Issue Type: Bug
>Reporter: He Liangliang
>Assignee: He Liangliang
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3
>
> Attachments: HBASE-13250-V0.patch
>
>
> The chuser/chgroup function only covers the leaf hfile. The ownership of 
> hfile parent paths and snapshot reference files are not changed as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14433) Set down the client executor core thread count from 256 in tests

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791189#comment-14791189
 ] 

Hudson commented on HBASE-14433:


SUCCESS: Integrated in HBase-1.3-IT #161 (See 
[https://builds.apache.org/job/HBase-1.3-IT/161/])
HBASE-14433 Set down the client executor core thread count from 256 in tests: 
REAPPLY AGAIN (WAS MISSING JIRA) (stack: rev 
82554e275017bf1eb941a3b3c3145f5c2516cf54)
* hbase-client/src/test/resources/hbase-site.xml
* hbase-server/src/test/resources/hbase-site.xml


> Set down the client executor core thread count from 256 in tests
> 
>
> Key: HBASE-14433
> URL: https://issues.apache.org/jira/browse/HBASE-14433
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14433 (1).txt, 14433.txt, 14433v2.txt, 14433v3.txt, 
> 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 
> 14433v4.reapply.txt
>
>
> HBASE-10449 upped our core count from 0 to 256 (max is 256). Looking in a 
> recent test run core dump, I see up to 256 threads per client and all are 
> idle. At a minimum it makes it hard reading test thread dumps. Trying to 
> learn more about why we went a core of 256 over in HBASE-10449. Meantime will 
> try setting down configs for test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13250) chown of ExportSnapshot does not cover all path and files

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791190#comment-14791190
 ] 

Hudson commented on HBASE-13250:


SUCCESS: Integrated in HBase-1.3-IT #161 (See 
[https://builds.apache.org/job/HBase-1.3-IT/161/])
HBASE-13250 chown of ExportSnapshot does not cover all path and files (He 
Liangliang) (tedyu: rev 6598f18e564bf06348e99863548546f092808c35)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java


> chown of ExportSnapshot does not cover all path and files
> -
>
> Key: HBASE-13250
> URL: https://issues.apache.org/jira/browse/HBASE-13250
> Project: HBase
>  Issue Type: Bug
>Reporter: He Liangliang
>Assignee: He Liangliang
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3
>
> Attachments: HBASE-13250-V0.patch
>
>
> The chuser/chgroup function only covers the leaf hfile. The ownership of 
> hfile parent paths and snapshot reference files are not changed as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14433) Set down the client executor core thread count from 256 in tests

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791155#comment-14791155
 ] 

Hudson commented on HBASE-14433:


SUCCESS: Integrated in HBase-1.2-IT #151 (See 
[https://builds.apache.org/job/HBase-1.2-IT/151/])
HBASE-14433 Set down the client executor core thread count from 256 in tests: 
REAPPLY AGAIN (WAS MISSING JIRA) (stack: rev 
5764fab04d6234c77ec0333c1878237f420cc83c)
* hbase-server/src/test/resources/hbase-site.xml
* hbase-client/src/test/resources/hbase-site.xml


> Set down the client executor core thread count from 256 in tests
> 
>
> Key: HBASE-14433
> URL: https://issues.apache.org/jira/browse/HBASE-14433
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14433 (1).txt, 14433.txt, 14433v2.txt, 14433v3.txt, 
> 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 
> 14433v4.reapply.txt
>
>
> HBASE-10449 upped our core count from 0 to 256 (max is 256). Looking in a 
> recent test run core dump, I see up to 256 threads per client and all are 
> idle. At a minimum it makes it hard reading test thread dumps. Trying to 
> learn more about why we went a core of 256 over in HBASE-10449. Meantime will 
> try setting down configs for test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13250) chown of ExportSnapshot does not cover all path and files

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791156#comment-14791156
 ] 

Hudson commented on HBASE-13250:


SUCCESS: Integrated in HBase-1.2-IT #151 (See 
[https://builds.apache.org/job/HBase-1.2-IT/151/])
HBASE-13250 chown of ExportSnapshot does not cover all path and files (He 
Liangliang) (tedyu: rev b243c898e72d835a731d893c853c958072d42038)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java


> chown of ExportSnapshot does not cover all path and files
> -
>
> Key: HBASE-13250
> URL: https://issues.apache.org/jira/browse/HBASE-13250
> Project: HBase
>  Issue Type: Bug
>Reporter: He Liangliang
>Assignee: He Liangliang
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3
>
> Attachments: HBASE-13250-V0.patch
>
>
> The chuser/chgroup function only covers the leaf hfile. The ownership of 
> hfile parent paths and snapshot reference files are not changed as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager

2015-09-16 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791145#comment-14791145
 ] 

Nicolas Liochon commented on HBASE-10449:
-

It's the former: in this case, the queries are queued. A new thread will be 
created only when the queue is full. Then, if we reach maxThreads and the queue 
is full the new tasks are rejected. In our case the queue is nearly unbounded, 
so we stay with corePoolSize.

> Wrong execution pool configuration in HConnectionManager
> 
>
> Key: HBASE-10449
> URL: https://issues.apache.org/jira/browse/HBASE-10449
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.98.0, 0.99.0, 0.96.1.1
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Critical
> Fix For: 0.98.0, 0.96.2, 0.99.0
>
> Attachments: HBASE-10449.v1.patch
>
>
> There is a confusion in the configuration of the pool. The attached patch 
> fixes this. This may change the client performances, as we were using a 
> single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14433) Set down the client executor core thread count from 256 in tests

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791140#comment-14791140
 ] 

Hudson commented on HBASE-14433:


FAILURE: Integrated in HBase-1.2 #178 (See 
[https://builds.apache.org/job/HBase-1.2/178/])
HBASE-14433 Set down the client executor core thread count from 256 in tests: 
REAPPLY AGAIN (WAS MISSING JIRA) (stack: rev 
5764fab04d6234c77ec0333c1878237f420cc83c)
* hbase-server/src/test/resources/hbase-site.xml
* hbase-client/src/test/resources/hbase-site.xml


> Set down the client executor core thread count from 256 in tests
> 
>
> Key: HBASE-14433
> URL: https://issues.apache.org/jira/browse/HBASE-14433
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14433 (1).txt, 14433.txt, 14433v2.txt, 14433v3.txt, 
> 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 
> 14433v4.reapply.txt
>
>
> HBASE-10449 upped our core count from 0 to 256 (max is 256). Looking in a 
> recent test run core dump, I see up to 256 threads per client and all are 
> idle. At a minimum it makes it hard reading test thread dumps. Trying to 
> learn more about why we went a core of 256 over in HBASE-10449. Meantime will 
> try setting down configs for test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager

2015-09-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791135#comment-14791135
 ] 

stack commented on HBASE-10449:
---

That makes sense. What happens if query happens if query every second: i.e. so 
there are periods when we have more queries than coreSize? Do the > coreSize 
query go in queue or do we make new threads to handle them? If latter, good, if 
former bad. Let me look at other issue.

> Wrong execution pool configuration in HConnectionManager
> 
>
> Key: HBASE-10449
> URL: https://issues.apache.org/jira/browse/HBASE-10449
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.98.0, 0.99.0, 0.96.1.1
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Critical
> Fix For: 0.98.0, 0.96.2, 0.99.0
>
> Attachments: HBASE-10449.v1.patch
>
>
> There is a confusion in the configuration of the pool. The attached patch 
> fixes this. This may change the client performances, as we were using a 
> single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13250) chown of ExportSnapshot does not cover all path and files

2015-09-16 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-13250:
---
Fix Version/s: (was: 0.98.15)

> chown of ExportSnapshot does not cover all path and files
> -
>
> Key: HBASE-13250
> URL: https://issues.apache.org/jira/browse/HBASE-13250
> Project: HBase
>  Issue Type: Bug
>Reporter: He Liangliang
>Assignee: He Liangliang
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3
>
> Attachments: HBASE-13250-V0.patch
>
>
> The chuser/chgroup function only covers the leaf hfile. The ownership of 
> hfile parent paths and snapshot reference files are not changed as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HBASE-13250) chown of ExportSnapshot does not cover all path and files

2015-09-16 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened HBASE-13250:


Reverted from 0.98 due to compilation error against hadoop-1 profile

> chown of ExportSnapshot does not cover all path and files
> -
>
> Key: HBASE-13250
> URL: https://issues.apache.org/jira/browse/HBASE-13250
> Project: HBase
>  Issue Type: Bug
>Reporter: He Liangliang
>Assignee: He Liangliang
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3
>
> Attachments: HBASE-13250-V0.patch
>
>
> The chuser/chgroup function only covers the leaf hfile. The ownership of 
> hfile parent paths and snapshot reference files are not changed as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager

2015-09-16 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791129#comment-14791129
 ] 

Nicolas Liochon commented on HBASE-10449:
-

The algo for the ThreadPoolExecutor is:

onNewTask(){
  if (currentSize < coreSize) createNewThread() else reuseThread()
}

And there is a timeout for each thread.

So if we do a coreSize of 2, a time of 20s, and a query every 15s, we have:
0s query1: create thread1, poolSize=1
15s query2: create thread2, poolSize=2
20s close thread1, poolSize=1
30s query3: create thread3, poolSize=2
35s: close thread2, poolSize=1
45s: query4: create thread4, poolSize=2

And so on. So even if we have 1 query each 15s, we have 2 threads in the pool 
nearly all the time.

> Yes. Smile. Need to revive it for here and for doing client timeouts
I found the code in TestClientNoCluster#run , ready to be reused!

I think we need to go for a hack like in Stackoverflow or for a different 
implementation for TPE like HBASE-11590...

> Wrong execution pool configuration in HConnectionManager
> 
>
> Key: HBASE-10449
> URL: https://issues.apache.org/jira/browse/HBASE-10449
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.98.0, 0.99.0, 0.96.1.1
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Critical
> Fix For: 0.98.0, 0.96.2, 0.99.0
>
> Attachments: HBASE-10449.v1.patch
>
>
> There is a confusion in the configuration of the pool. The attached patch 
> fixes this. This may change the client performances, as we were using a 
> single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14433) Set down the client executor core thread count from 256 in tests

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791127#comment-14791127
 ] 

Hudson commented on HBASE-14433:


FAILURE: Integrated in HBase-1.3 #180 (See 
[https://builds.apache.org/job/HBase-1.3/180/])
HBASE-14433 Set down the client executor core thread count from 256 in tests: 
REAPPLY AGAIN (WAS MISSING JIRA) (stack: rev 
82554e275017bf1eb941a3b3c3145f5c2516cf54)
* hbase-server/src/test/resources/hbase-site.xml
* hbase-client/src/test/resources/hbase-site.xml


> Set down the client executor core thread count from 256 in tests
> 
>
> Key: HBASE-14433
> URL: https://issues.apache.org/jira/browse/HBASE-14433
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14433 (1).txt, 14433.txt, 14433v2.txt, 14433v3.txt, 
> 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 
> 14433v4.reapply.txt
>
>
> HBASE-10449 upped our core count from 0 to 256 (max is 256). Looking in a 
> recent test run core dump, I see up to 256 threads per client and all are 
> idle. At a minimum it makes it hard reading test thread dumps. Trying to 
> learn more about why we went a core of 256 over in HBASE-10449. Meantime will 
> try setting down configs for test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13250) chown of ExportSnapshot does not cover all path and files

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791120#comment-14791120
 ] 

Hudson commented on HBASE-13250:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1077 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1077/])
HBASE-13250 chown of ExportSnapshot does not cover all path and files (He 
Liangliang) (tedyu: rev bcd986e47b8d633c996c8a2040c2a40b32cb5c59)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java


> chown of ExportSnapshot does not cover all path and files
> -
>
> Key: HBASE-13250
> URL: https://issues.apache.org/jira/browse/HBASE-13250
> Project: HBase
>  Issue Type: Bug
>Reporter: He Liangliang
>Assignee: He Liangliang
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: HBASE-13250-V0.patch
>
>
> The chuser/chgroup function only covers the leaf hfile. The ownership of 
> hfile parent paths and snapshot reference files are not changed as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14128) Fix inability to run Multiple MR over the same Snapshot

2015-09-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791074#comment-14791074
 ] 

Hadoop QA commented on HBASE-14128:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12756303/HBASE-14128-v0.patch
  against master branch at commit bd26386dc7205c9b30b8488bc094bd380ec09adb.
  ATTACHMENT ID: 12756303

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1837 checkstyle errors (more than the master's current 1835 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportExport
  org.apache.hadoop.hbase.util.TestProcessBasedCluster
  org.apache.hadoop.hbase.regionserver.TestWALLockup

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15625//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15625//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15625//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15625//console

This message is automatically generated.

> Fix inability to run Multiple MR over the same Snapshot
> ---
>
> Key: HBASE-14128
> URL: https://issues.apache.org/jira/browse/HBASE-14128
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce, snapshots
>Reporter: Matteo Bertozzi
>Assignee: santosh kumar
>Priority: Minor
>  Labels: beginner, noob
> Attachments: HBASE-14128-v0.patch
>
>
> from the list, running multiple MR over the same snapshot does not work
> {code}
> public static void copySnapshotForScanner(Configuration conf, FileSystem ..
> RestoreSnapshotHelper helper = new RestoreSnapshotHelper(conf, fs,
>   manifest, manifest.getTableDescriptor(), restoreDir, monitor, status);
> {code}
> the problem is that manifest.getTableDescriptor() will try to clone the 
> snapshot with the same target name. ending up in "file already exist" 
> exceptions.
> we just need to clone that descriptor and generate a new target table name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14352) Replication is terribly slow with WAL compression

2015-09-16 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791070#comment-14791070
 ] 

Andrew Purtell commented on HBASE-14352:


When I've tested wal compression I've found the hit to write performance 
(increased latency leading to a lower aggregate write ceiling cluster-wide) to 
outweigh space savings and any gains from that. Is this the general experience? 
Maybe the answer is to deprecate WAL compression? 

> Replication is terribly slow with WAL compression
> -
>
> Key: HBASE-14352
> URL: https://issues.apache.org/jira/browse/HBASE-14352
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.13
>Reporter: Abhishek Singh Chouhan
> Attachments: age_of_last_shipped.png, size_of_log_queue.png
>
>
> For the same load, replication with WAL compression enabled is almost 6x 
> slower than with compression turned off. Age of last shipped operation is 
> also correspondingly much higher when compression is turned on. 
> By observing Size of log queue we can see that it is taking too much time for 
> the queue to clear up.
> Attaching corresponding graphs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13250) chown of ExportSnapshot does not cover all path and files

2015-09-16 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-13250:
---
 Hadoop Flags: Reviewed
Fix Version/s: 1.1.3
   1.0.3
   0.98.15
   1.3.0
   1.2.0
   2.0.0

> chown of ExportSnapshot does not cover all path and files
> -
>
> Key: HBASE-13250
> URL: https://issues.apache.org/jira/browse/HBASE-13250
> Project: HBase
>  Issue Type: Bug
>Reporter: He Liangliang
>Assignee: He Liangliang
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: HBASE-13250-V0.patch
>
>
> The chuser/chgroup function only covers the leaf hfile. The ownership of 
> hfile parent paths and snapshot reference files are not changed as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13250) chown of ExportSnapshot does not cover all path and files

2015-09-16 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-13250:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the patch, Liangliang.

> chown of ExportSnapshot does not cover all path and files
> -
>
> Key: HBASE-13250
> URL: https://issues.apache.org/jira/browse/HBASE-13250
> Project: HBase
>  Issue Type: Bug
>Reporter: He Liangliang
>Assignee: He Liangliang
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: HBASE-13250-V0.patch
>
>
> The chuser/chgroup function only covers the leaf hfile. The ownership of 
> hfile parent paths and snapshot reference files are not changed as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails

2015-09-16 Thread Samir Ahmic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791053#comment-14791053
 ] 

Samir Ahmic commented on HBASE-14431:
-

This is interesting. I have run TestFastFail several times on two different 
machines and test never fails. I was using java 1.7.0_80 and 1.7.0_71




-  

> AsyncRpcClient#removeConnection() never removes connection from connections 
> pool if server fails
> 
>
> Key: HBASE-14431
> URL: https://issues.apache.org/jira/browse/HBASE-14431
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 2.0.0, 1.0.2, 1.1.2
>Reporter: Samir Ahmic
>Assignee: Samir Ahmic
>Priority: Critical
> Attachments: HBASE-14431.patch
>
>
> I was playing with master branch in distributed mode (3 rs + master + 
> backup_master) and notice strange behavior when i was testing this sequence 
> of events on single rs: /kill/start/run_balancer while client was writing 
> data to cluster (LoadTestTool).
> I have notice that LTT fails with following:
> {code}
> 2015-09-09 11:05:58,364 INFO  [main] client.AsyncProcess: #2, waiting for 
> some tasks to finish. Expected max=0, tasksInProgress=35
> Exception in thread "main" 
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 
> action: BindException: 1 time, 
>   at 
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228)
>   at 
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208)
>   at 
> org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697)
>   at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211)
> {code}
> After some digging  and adding some more logging in code i have notice that 
> following condition in  {code}AsyncRpcClient.removeConnection(AsyncRpcChannel 
> connection) {code} is never true:
> {code}
> if (connectionInPool == connection) {
> {code} 
> causing that  {code}AsyncRpcChannel{code} connection is never removed from 
> {code}connections{code} pool in case rs fails.
> After changing this condition to:
> {code}
> if (connectionInPool.address.equals(connection.address)) {
> {code}
> issue was resolved and client was removing failed server from connections 
> pool.
> I will attach patch after running some more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14352) Replication is terribly slow with WAL compression

2015-09-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791052#comment-14791052
 ] 

Lars Hofhansl commented on HBASE-14352:
---

I took a look at the code some weeks back. The problem immediately jumps out... 
At the source we constantly reset the read position into the current WAL. With 
compression it means we have start from a point where the compression 
dictionary is written. That is very expensive.

We have to do that in order to be sure we'll see the edits in the current block 
being written.
So I don't see immediately a way out of it. Perhaps we simply tail until we 
reach the end of a file. And that case we'll try one more time with a reset, 
and only declare the WAL done when that is done.

> Replication is terribly slow with WAL compression
> -
>
> Key: HBASE-14352
> URL: https://issues.apache.org/jira/browse/HBASE-14352
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.13
>Reporter: Abhishek Singh Chouhan
> Attachments: age_of_last_shipped.png, size_of_log_queue.png
>
>
> For the same load, replication with WAL compression enabled is almost 6x 
> slower than with compression turned off. Age of last shipped operation is 
> also correspondingly much higher when compression is turned on. 
> By observing Size of log queue we can see that it is taking too much time for 
> the queue to clear up.
> Attaching corresponding graphs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14433) Set down the client executor core thread count from 256 in tests

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791038#comment-14791038
 ] 

Hudson commented on HBASE-14433:


FAILURE: Integrated in HBase-TRUNK #6814 (See 
[https://builds.apache.org/job/HBase-TRUNK/6814/])
HBASE-14433 Set down the client executor core thread count from 256 in tests: 
REAPPLY AGAIN (WAS MISSING JIRA) (stack: rev 
bd26386dc7205c9b30b8488bc094bd380ec09adb)
* hbase-server/src/test/resources/hbase-site.xml
* hbase-client/src/test/resources/hbase-site.xml


> Set down the client executor core thread count from 256 in tests
> 
>
> Key: HBASE-14433
> URL: https://issues.apache.org/jira/browse/HBASE-14433
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14433 (1).txt, 14433.txt, 14433v2.txt, 14433v3.txt, 
> 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 
> 14433v4.reapply.txt
>
>
> HBASE-10449 upped our core count from 0 to 256 (max is 256). Looking in a 
> recent test run core dump, I see up to 256 threads per client and all are 
> idle. At a minimum it makes it hard reading test thread dumps. Trying to 
> learn more about why we went a core of 256 over in HBASE-10449. Meantime will 
> try setting down configs for test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14334) Move Memcached block cache in to it's own optional module.

2015-09-16 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-14334:
--
Attachment: HBASE-14334-v1.patch

Patch with a better description.

> Move Memcached block cache in to it's own optional module.
> --
>
> Key: HBASE-14334
> URL: https://issues.apache.org/jira/browse/HBASE-14334
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0
>
> Attachments: HBASE-14334-v1.patch, HBASE-14334.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-14445) ExportSnapshot does not honor -chuser, -chgroup, -chmod options

2015-09-16 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-14445.

Resolution: Duplicate

> ExportSnapshot does not honor -chuser, -chgroup, -chmod options
> ---
>
> Key: HBASE-14445
> URL: https://issues.apache.org/jira/browse/HBASE-14445
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.4
>Reporter: Ted Yu
>
> Create a snapshot of an existing HBase table, export the snapshot using the 
> -chuser, -chgroup, -chmod options.
> Look in hdfs filesystem for export. The files do not have the correct 
> ownership, group, permissions
> Thanks to Ian Roberts who first reported the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14334) Move Memcached block cache in to it's own optional module.

2015-09-16 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790979#comment-14790979
 ] 

Elliott Clark commented on HBASE-14334:
---

bq.The above is all the doc I'd see this module getting so say something about 
when it'd be used and how to enable it.
I'm still hoping to provide better. You know how that goes though.

> Move Memcached block cache in to it's own optional module.
> --
>
> Key: HBASE-14334
> URL: https://issues.apache.org/jira/browse/HBASE-14334
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0
>
> Attachments: HBASE-14334-v1.patch, HBASE-14334.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14433) Set down the client executor core thread count from 256 in tests

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790967#comment-14790967
 ] 

Hudson commented on HBASE-14433:


FAILURE: Integrated in HBase-TRUNK #6813 (See 
[https://builds.apache.org/job/HBase-TRUNK/6813/])
Revert "HBASE-14433 Set down the client executor core thread count from 256 to 
number of processors" (stack: rev 8633b26ee5095e82a9792a86dc5c95a4cf23f858)
* hbase-client/src/test/resources/hbase-site.xml
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
* hbase-server/src/test/resources/hbase-site.xml


> Set down the client executor core thread count from 256 in tests
> 
>
> Key: HBASE-14433
> URL: https://issues.apache.org/jira/browse/HBASE-14433
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14433 (1).txt, 14433.txt, 14433v2.txt, 14433v3.txt, 
> 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 
> 14433v4.reapply.txt
>
>
> HBASE-10449 upped our core count from 0 to 256 (max is 256). Looking in a 
> recent test run core dump, I see up to 256 threads per client and all are 
> idle. At a minimum it makes it hard reading test thread dumps. Trying to 
> learn more about why we went a core of 256 over in HBASE-10449. Meantime will 
> try setting down configs for test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14445) ExportSnapshot does not honor -chuser, -chgroup, -chmod options

2015-09-16 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790963#comment-14790963
 ] 

Matteo Bertozzi commented on HBASE-14445:
-

isn't this the same as HBASE-13250?

> ExportSnapshot does not honor -chuser, -chgroup, -chmod options
> ---
>
> Key: HBASE-14445
> URL: https://issues.apache.org/jira/browse/HBASE-14445
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.4
>Reporter: Ted Yu
>
> Create a snapshot of an existing HBase table, export the snapshot using the 
> -chuser, -chgroup, -chmod options.
> Look in hdfs filesystem for export. The files do not have the correct 
> ownership, group, permissions
> Thanks to Ian Roberts who first reported the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14445) ExportSnapshot does not honor -chuser, -chgroup, -chmod options

2015-09-16 Thread Ted Yu (JIRA)
Ted Yu created HBASE-14445:
--

 Summary: ExportSnapshot does not honor -chuser, -chgroup, -chmod 
options
 Key: HBASE-14445
 URL: https://issues.apache.org/jira/browse/HBASE-14445
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.4
Reporter: Ted Yu


Create a snapshot of an existing HBase table, export the snapshot using the 
-chuser, -chgroup, -chmod options.
Look in hdfs filesystem for export. The files do not have the correct 
ownership, group, permissions

Thanks to Ian Roberts who first reported the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14352) Replication is terribly slow with WAL compression

2015-09-16 Thread Abhishek Singh Chouhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790911#comment-14790911
 ] 

Abhishek Singh Chouhan commented on HBASE-14352:


Yep...both of them had compression enabled.

> Replication is terribly slow with WAL compression
> -
>
> Key: HBASE-14352
> URL: https://issues.apache.org/jira/browse/HBASE-14352
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.13
>Reporter: Abhishek Singh Chouhan
> Attachments: age_of_last_shipped.png, size_of_log_queue.png
>
>
> For the same load, replication with WAL compression enabled is almost 6x 
> slower than with compression turned off. Age of last shipped operation is 
> also correspondingly much higher when compression is turned on. 
> By observing Size of log queue we can see that it is taking too much time for 
> the queue to clear up.
> Attaching corresponding graphs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14443) Add request parameter to the TooSlow/TooLarge warn message of RpcServer

2015-09-16 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790841#comment-14790841
 ] 

Nick Dimiduk commented on HBASE-14443:
--

Agreed. Anything will help here. Also, HBASE-14333.

> Add request parameter to the TooSlow/TooLarge warn message of RpcServer
> ---
>
> Key: HBASE-14443
> URL: https://issues.apache.org/jira/browse/HBASE-14443
> Project: HBase
>  Issue Type: Improvement
>  Components: rpc
>Reporter: Jianwei Cui
>Priority: Minor
> Fix For: 1.2.1
>
>
> The RpcServer will log a warn message for TooSlow or TooLarge request as:
> {code}
> logResponse(new Object[]{param},
> md.getName(), md.getName() + "(" + param.getClass().getName() + 
> ")",
> (tooLarge ? "TooLarge" : "TooSlow"),
> status.getClient(), startTime, processingTime, qTime,
> responseSize);
> {code}
> The RpcServer#logResponse will create the warn message as:
> {code}
> if (params.length == 2 && server instanceof HRegionServer &&
> params[0] instanceof byte[] &&
> params[1] instanceof Operation) {
>   ...
>   responseInfo.putAll(((Operation) params[1]).toMap());
>   ...
> } else if (params.length == 1 && server instanceof HRegionServer &&
> params[0] instanceof Operation) {
>   ...
>   responseInfo.putAll(((Operation) params[0]).toMap());
>   ...
> } else {
>   ...
> }
> {code}
> Because the parameter is always a protobuf message, not an instance of 
> Operation, the request parameter will not be added into the warn message. The 
> parameter is helpful to find out the problem, for example, knowing the 
> startRow/endRow is useful for a TooSlow scan. To improve the warn message, we 
> can transform the protobuf request message to corresponding Operation 
> subclass object by ProtobufUtil, so that it can be added the warn message. 
> Suggestion and discussion are welcomed.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14442) MultiTableInputFormatBase.getSplits dosenot build split for a scan whose startRow=stopRow=(startRow of a region)

2015-09-16 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790833#comment-14790833
 ] 

Nick Dimiduk commented on HBASE-14442:
--

Hi Nathan, can you provide a unit test that demonstrates this bug? See 
https://github.com/apache/hbase/blob/master/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMultiTableInputFormat.java
 for existing tests.

> MultiTableInputFormatBase.getSplits dosenot build split for a scan whose 
> startRow=stopRow=(startRow of a region)
> 
>
> Key: HBASE-14442
> URL: https://issues.apache.org/jira/browse/HBASE-14442
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.1.2
>Reporter: Nathan
>Assignee: Nathan
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> I created a Scan whose startRow and stopRow are the same with a region's 
> startRow, then I found no map was built. 
> The following is the source code of this condtion:
> (startRow.length == 0 || keys.getSecond()[i].length == 0 ||
> Bytes.compareTo(startRow, keys.getSecond()[i]) < 0) &&
> (stopRow.length == 0 || Bytes.compareTo(stopRow,
> keys.getFirst()[i]) > 0)
> I think  a "=" should be added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails

2015-09-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790813#comment-14790813
 ] 

Hadoop QA commented on HBASE-14431:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12756275/HBASE-14431.patch
  against master branch at commit d2e338181800ae3cef55ddca491901b65259dc7f.
  ATTACHMENT ID: 12756275

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestFastFail

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15624//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15624//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15624//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15624//console

This message is automatically generated.

> AsyncRpcClient#removeConnection() never removes connection from connections 
> pool if server fails
> 
>
> Key: HBASE-14431
> URL: https://issues.apache.org/jira/browse/HBASE-14431
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 2.0.0, 1.0.2, 1.1.2
>Reporter: Samir Ahmic
>Assignee: Samir Ahmic
>Priority: Critical
> Attachments: HBASE-14431.patch
>
>
> I was playing with master branch in distributed mode (3 rs + master + 
> backup_master) and notice strange behavior when i was testing this sequence 
> of events on single rs: /kill/start/run_balancer while client was writing 
> data to cluster (LoadTestTool).
> I have notice that LTT fails with following:
> {code}
> 2015-09-09 11:05:58,364 INFO  [main] client.AsyncProcess: #2, waiting for 
> some tasks to finish. Expected max=0, tasksInProgress=35
> Exception in thread "main" 
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 
> action: BindException: 1 time, 
>   at 
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228)
>   at 
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208)
>   at 
> org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697)
>   at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211)
> {code}
> After some digging  and adding some more logging in code i have notice that 
> following condition in  {code}AsyncRpcClient.removeConnection(AsyncRpcChannel 
> connection) {code} is never true:
> {code}
> if (connectionInPool == connection) {
> {code} 
> causing that  {code}AsyncRpcChannel{code} connection is never removed from 
> {code}connections{code} pool in case rs fails.
> After changing this condition to:
> {code}
> if (connectionInPool.address.equals(connection.address)) {
> {code}
> issue was resolved and client was removing failed server from connections 
> pool.
> I will attach patch after running some more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14128) Fix inability to run Multiple MR over the same Snapshot

2015-09-16 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-14128:

Status: Patch Available  (was: Open)

> Fix inability to run Multiple MR over the same Snapshot
> ---
>
> Key: HBASE-14128
> URL: https://issues.apache.org/jira/browse/HBASE-14128
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce, snapshots
>Reporter: Matteo Bertozzi
>Assignee: santosh kumar
>Priority: Minor
>  Labels: beginner, noob
> Attachments: HBASE-14128-v0.patch
>
>
> from the list, running multiple MR over the same snapshot does not work
> {code}
> public static void copySnapshotForScanner(Configuration conf, FileSystem ..
> RestoreSnapshotHelper helper = new RestoreSnapshotHelper(conf, fs,
>   manifest, manifest.getTableDescriptor(), restoreDir, monitor, status);
> {code}
> the problem is that manifest.getTableDescriptor() will try to clone the 
> snapshot with the same target name. ending up in "file already exist" 
> exceptions.
> we just need to clone that descriptor and generate a new target table name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14128) Fix inability to run Multiple MR over the same Snapshot

2015-09-16 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-14128:

Attachment: HBASE-14128-v0.patch

This was much harder than I thought...
Attached a v0 which should solve the htd problem.
but the code can (and should) be cleaned up more.
TableSnapshotScanner was the easiest one to cleanup, I removed the double 
snapshot manifest read and the use of the wrong HRI (the one from snapshot 
instead of restore).
The SnapshotFormatImpl still have the double manifest reading.. but I fixed in 
a hacky way the wrong HRI use. The problem here is that RestoreHelper is called 
in setInput() and I don't see an easy way to pass the list of HRI that we have 
there to the Split method..

> Fix inability to run Multiple MR over the same Snapshot
> ---
>
> Key: HBASE-14128
> URL: https://issues.apache.org/jira/browse/HBASE-14128
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce, snapshots
>Reporter: Matteo Bertozzi
>Assignee: santosh kumar
>Priority: Minor
>  Labels: beginner, noob
> Attachments: HBASE-14128-v0.patch
>
>
> from the list, running multiple MR over the same snapshot does not work
> {code}
> public static void copySnapshotForScanner(Configuration conf, FileSystem ..
> RestoreSnapshotHelper helper = new RestoreSnapshotHelper(conf, fs,
>   manifest, manifest.getTableDescriptor(), restoreDir, monitor, status);
> {code}
> the problem is that manifest.getTableDescriptor() will try to clone the 
> snapshot with the same target name. ending up in "file already exist" 
> exceptions.
> we just need to clone that descriptor and generate a new target table name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14411) Fix unit test failures when using multiwal as default WAL provider

2015-09-16 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790758#comment-14790758
 ] 

Ted Yu commented on HBASE-14411:


TestWALLockup passed here:
https://builds.apache.org/job/HBase-1.3/jdk=latest1.7,label=Hadoop/178/console

> Fix unit test failures when using multiwal as default WAL provider
> --
>
> Key: HBASE-14411
> URL: https://issues.apache.org/jira/browse/HBASE-14411
> Project: HBase
>  Issue Type: Bug
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14411.branch-1.patch, HBASE-14411.patch, 
> HBASE-14411_v2.patch
>
>
> If we set hbase.wal.provider to multiwal in 
> hbase-server/src/test/resources/hbase-site.xml which allows us to use 
> BoundedRegionGroupingProvider in UT, we will observe below failures in 
> current code base:
> {noformat}
> Failed tests:
>   TestHLogRecordReader>TestWALRecordReader.testPartialRead:164 expected:<1> 
> but was:<2>
>   TestHLogRecordReader>TestWALRecordReader.testWALRecordReader:216 
> expected:<2> but was:<3>
>   TestWALRecordReader.testPartialRead:164 expected:<1> but was:<2>
>   TestWALRecordReader.testWALRecordReader:216 expected:<2> but was:<3>
>   TestDistributedLogSplitting.testRecoveredEdits:276 edits dir should have 
> more than a single file in it. instead has 1
>   TestAtomicOperation.testMultiRowMutationMultiThreads:499 expected:<0> but 
> was:<1>
>   TestHRegionServerBulkLoad.testAtomicBulkLoad:307
> Expected: is 
>  but: was 
>   TestLogRolling.testCompactionRecordDoesntBlockRolling:611 Should have WAL; 
> one table is not flushed expected:<1> but was:<0>
>   TestLogRolling.testLogRollOnDatanodeDeath:359 null
>   TestLogRolling.testLogRollOnPipelineRestart:472 Missing datanode should've 
> triggered a log roll
>   TestReplicationSourceManager.testLogRoll:237 expected:<6> but was:<7>
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594 if 
> skip.errors is false all files should remain in place expected:<11> but 
> was:<12>
>   TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong number of files in the 
> archive log expected:<11> but was:<12>
>   TestWALSplit.testMovedWALDuringRecovery:810->retryOverHdfsProblem:793 
> expected:<11> but was:<12>
>   TestWALSplit.testRetryOpenDuringRecovery:838->retryOverHdfsProblem:793 
> expected:<11> but was:<12>
>   
> TestWALSplitCompressed>TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594
>  if skip.errors is false all files should remain in place expected:<11> but 
> was:<12>
>   TestWALSplitCompressed>TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong 
> number of files in the archive log expected:<11> but was:<12>
>   
> TestWALSplitCompressed>TestWALSplit.testMovedWALDuringRecovery:810->TestWALSplit.retryOverHdfsProblem:793
>  expected:<11> but was:<12>
>   
> TestWALSplitCompressed>TestWALSplit.testRetryOpenDuringRecovery:838->TestWALSplit.retryOverHdfsProblem:793
>  expected:<11> but was:<12>
> {noformat}
> While patch for HBASE-14306 could resolve failures of TestHLogRecordReader, 
> TestReplicationSourceManager and TestReplicationWALReaderManager, this JIRA 
> will focus on resolving the others



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14411) Fix unit test failures when using multiwal as default WAL provider

2015-09-16 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790751#comment-14790751
 ] 

Ted Yu commented on HBASE-14411:


Did you mean TestWALLockup ?

I ran it locally before committing the patch - it passed.

This patch doesn't change default wal provider to multiwal. So the test failure 
was not related.

> Fix unit test failures when using multiwal as default WAL provider
> --
>
> Key: HBASE-14411
> URL: https://issues.apache.org/jira/browse/HBASE-14411
> Project: HBase
>  Issue Type: Bug
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14411.branch-1.patch, HBASE-14411.patch, 
> HBASE-14411_v2.patch
>
>
> If we set hbase.wal.provider to multiwal in 
> hbase-server/src/test/resources/hbase-site.xml which allows us to use 
> BoundedRegionGroupingProvider in UT, we will observe below failures in 
> current code base:
> {noformat}
> Failed tests:
>   TestHLogRecordReader>TestWALRecordReader.testPartialRead:164 expected:<1> 
> but was:<2>
>   TestHLogRecordReader>TestWALRecordReader.testWALRecordReader:216 
> expected:<2> but was:<3>
>   TestWALRecordReader.testPartialRead:164 expected:<1> but was:<2>
>   TestWALRecordReader.testWALRecordReader:216 expected:<2> but was:<3>
>   TestDistributedLogSplitting.testRecoveredEdits:276 edits dir should have 
> more than a single file in it. instead has 1
>   TestAtomicOperation.testMultiRowMutationMultiThreads:499 expected:<0> but 
> was:<1>
>   TestHRegionServerBulkLoad.testAtomicBulkLoad:307
> Expected: is 
>  but: was 
>   TestLogRolling.testCompactionRecordDoesntBlockRolling:611 Should have WAL; 
> one table is not flushed expected:<1> but was:<0>
>   TestLogRolling.testLogRollOnDatanodeDeath:359 null
>   TestLogRolling.testLogRollOnPipelineRestart:472 Missing datanode should've 
> triggered a log roll
>   TestReplicationSourceManager.testLogRoll:237 expected:<6> but was:<7>
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594 if 
> skip.errors is false all files should remain in place expected:<11> but 
> was:<12>
>   TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong number of files in the 
> archive log expected:<11> but was:<12>
>   TestWALSplit.testMovedWALDuringRecovery:810->retryOverHdfsProblem:793 
> expected:<11> but was:<12>
>   TestWALSplit.testRetryOpenDuringRecovery:838->retryOverHdfsProblem:793 
> expected:<11> but was:<12>
>   
> TestWALSplitCompressed>TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594
>  if skip.errors is false all files should remain in place expected:<11> but 
> was:<12>
>   TestWALSplitCompressed>TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong 
> number of files in the archive log expected:<11> but was:<12>
>   
> TestWALSplitCompressed>TestWALSplit.testMovedWALDuringRecovery:810->TestWALSplit.retryOverHdfsProblem:793
>  expected:<11> but was:<12>
>   
> TestWALSplitCompressed>TestWALSplit.testRetryOpenDuringRecovery:838->TestWALSplit.retryOverHdfsProblem:793
>  expected:<11> but was:<12>
> {noformat}
> While patch for HBASE-14306 could resolve failures of TestHLogRecordReader, 
> TestReplicationSourceManager and TestReplicationWALReaderManager, this JIRA 
> will focus on resolving the others



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14411) Fix unit test failures when using multiwal as default WAL provider

2015-09-16 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790748#comment-14790748
 ] 

Elliott Clark commented on HBASE-14411:
---

That last test failure on Hadoop QA looks really related.

> Fix unit test failures when using multiwal as default WAL provider
> --
>
> Key: HBASE-14411
> URL: https://issues.apache.org/jira/browse/HBASE-14411
> Project: HBase
>  Issue Type: Bug
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14411.branch-1.patch, HBASE-14411.patch, 
> HBASE-14411_v2.patch
>
>
> If we set hbase.wal.provider to multiwal in 
> hbase-server/src/test/resources/hbase-site.xml which allows us to use 
> BoundedRegionGroupingProvider in UT, we will observe below failures in 
> current code base:
> {noformat}
> Failed tests:
>   TestHLogRecordReader>TestWALRecordReader.testPartialRead:164 expected:<1> 
> but was:<2>
>   TestHLogRecordReader>TestWALRecordReader.testWALRecordReader:216 
> expected:<2> but was:<3>
>   TestWALRecordReader.testPartialRead:164 expected:<1> but was:<2>
>   TestWALRecordReader.testWALRecordReader:216 expected:<2> but was:<3>
>   TestDistributedLogSplitting.testRecoveredEdits:276 edits dir should have 
> more than a single file in it. instead has 1
>   TestAtomicOperation.testMultiRowMutationMultiThreads:499 expected:<0> but 
> was:<1>
>   TestHRegionServerBulkLoad.testAtomicBulkLoad:307
> Expected: is 
>  but: was 
>   TestLogRolling.testCompactionRecordDoesntBlockRolling:611 Should have WAL; 
> one table is not flushed expected:<1> but was:<0>
>   TestLogRolling.testLogRollOnDatanodeDeath:359 null
>   TestLogRolling.testLogRollOnPipelineRestart:472 Missing datanode should've 
> triggered a log roll
>   TestReplicationSourceManager.testLogRoll:237 expected:<6> but was:<7>
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594 if 
> skip.errors is false all files should remain in place expected:<11> but 
> was:<12>
>   TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong number of files in the 
> archive log expected:<11> but was:<12>
>   TestWALSplit.testMovedWALDuringRecovery:810->retryOverHdfsProblem:793 
> expected:<11> but was:<12>
>   TestWALSplit.testRetryOpenDuringRecovery:838->retryOverHdfsProblem:793 
> expected:<11> but was:<12>
>   
> TestWALSplitCompressed>TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594
>  if skip.errors is false all files should remain in place expected:<11> but 
> was:<12>
>   TestWALSplitCompressed>TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong 
> number of files in the archive log expected:<11> but was:<12>
>   
> TestWALSplitCompressed>TestWALSplit.testMovedWALDuringRecovery:810->TestWALSplit.retryOverHdfsProblem:793
>  expected:<11> but was:<12>
>   
> TestWALSplitCompressed>TestWALSplit.testRetryOpenDuringRecovery:838->TestWALSplit.retryOverHdfsProblem:793
>  expected:<11> but was:<12>
> {noformat}
> While patch for HBASE-14306 could resolve failures of TestHLogRecordReader, 
> TestReplicationSourceManager and TestReplicationWALReaderManager, this JIRA 
> will focus on resolving the others



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14334) Move Memcached block cache in to it's own optional module.

2015-09-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790729#comment-14790729
 ] 

stack commented on HBASE-14334:
---

+1

On commit, add more to this:  HBase external block 
cache. 

The above is all the doc I'd see this module getting so say something about 
when it'd be used and how to enable it. Replicate as the release note on this 
issue.

> Move Memcached block cache in to it's own optional module.
> --
>
> Key: HBASE-14334
> URL: https://issues.apache.org/jira/browse/HBASE-14334
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0
>
> Attachments: HBASE-14334.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14411) Fix unit test failures when using multiwal as default WAL provider

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790724#comment-14790724
 ] 

Hudson commented on HBASE-14411:


SUCCESS: Integrated in HBase-1.3-IT #160 (See 
[https://builds.apache.org/job/HBase-1.3-IT/160/])
HBASE-14411 Fix unit test failures when using multiwal as default WAL provider 
(Yu Li) (tedyu: rev 0452ba09b53fb450c913811b77d74b6035b40ce3)
* hbase-server/src/main/java/org/apache/hadoop/hbase/wal/DefaultWALProvider.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/wal/TestWALSplit.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportExport.java


> Fix unit test failures when using multiwal as default WAL provider
> --
>
> Key: HBASE-14411
> URL: https://issues.apache.org/jira/browse/HBASE-14411
> Project: HBase
>  Issue Type: Bug
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14411.branch-1.patch, HBASE-14411.patch, 
> HBASE-14411_v2.patch
>
>
> If we set hbase.wal.provider to multiwal in 
> hbase-server/src/test/resources/hbase-site.xml which allows us to use 
> BoundedRegionGroupingProvider in UT, we will observe below failures in 
> current code base:
> {noformat}
> Failed tests:
>   TestHLogRecordReader>TestWALRecordReader.testPartialRead:164 expected:<1> 
> but was:<2>
>   TestHLogRecordReader>TestWALRecordReader.testWALRecordReader:216 
> expected:<2> but was:<3>
>   TestWALRecordReader.testPartialRead:164 expected:<1> but was:<2>
>   TestWALRecordReader.testWALRecordReader:216 expected:<2> but was:<3>
>   TestDistributedLogSplitting.testRecoveredEdits:276 edits dir should have 
> more than a single file in it. instead has 1
>   TestAtomicOperation.testMultiRowMutationMultiThreads:499 expected:<0> but 
> was:<1>
>   TestHRegionServerBulkLoad.testAtomicBulkLoad:307
> Expected: is 
>  but: was 
>   TestLogRolling.testCompactionRecordDoesntBlockRolling:611 Should have WAL; 
> one table is not flushed expected:<1> but was:<0>
>   TestLogRolling.testLogRollOnDatanodeDeath:359 null
>   TestLogRolling.testLogRollOnPipelineRestart:472 Missing datanode should've 
> triggered a log roll
>   TestReplicationSourceManager.testLogRoll:237 expected:<6> but was:<7>
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594 if 
> skip.errors is false all files should remain in place expected:<11> but 
> was:<12>
>   TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong number of files in the 
> archive log expected:<11> but was:<12>
>   TestWALSplit.testMovedWALDuringRecovery:810->retryOverHdfsProblem:793 
> expected:<11> but was:<12>
>   TestWALSplit.testRetryOpenDuringRecovery:838->retryOverHdfsProblem:793 
> expected:<11> but was:<12>
>   
> TestWALSplitCompressed>TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594
>  if skip.errors is false all files should remain in place expected:<11> but 
> was:<12>
>   TestWALSplitCompressed>TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong 
> number of files in the archive log expected:<11> but was:<12>
>   
> TestWALSplitCompressed>TestWALSplit.testMovedWALDuringRecovery:810->TestWALSplit.retryOverHdfsProblem:793
>  expected:<11> but was:<12>
>   
> TestWALSplitCompressed>TestWALSplit.testRetryOpenDuringRecovery:838->TestWALSplit.retryOverHdfsProblem:793
>  expected:<11> but was:<12>
> {noformat}
> While patch for HBASE-14306 could resolve failures of TestHLogRecordReader, 
> TestReplicationSourceManager and TestReplicationWALReaderManager, this JIRA 
> will focus on resolving the others



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12751) Allow RowLock to be reader writer

2015-09-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790723#comment-14790723
 ] 

stack commented on HBASE-12751:
---

kalashnikov:hbase.git stack$ python ./dev-support/findHangingTests.py  
https://builds.apache.org/job/PreCommit-HBASE-Build/15616//consoleText
Fetching the console output from the URL
Printing hanging tests
Hanging test : org.apache.hadoop.hbase.TestIOFencing
Hanging test : org.apache.hadoop.hbase.master.TestDistributedLogSplitting
Hanging test : org.apache.hadoop.hbase.master.procedure.TestServerCrashProcedure
Hanging test : 
org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDistributedLogReplay
Printing Failing tests
Failing test : org.apache.hadoop.hbase.client.TestReplicasClient

Let me look into these.

> Allow RowLock to be reader writer
> -
>
> Key: HBASE-12751
> URL: https://issues.apache.org/jira/browse/HBASE-12751
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.3.0
>
> Attachments: 12751.rebased.v25.txt, 12751.rebased.v26.txt, 
> 12751.rebased.v26.txt, 12751.rebased.v27.txt, 12751.rebased.v29.txt, 
> 12751.rebased.v31.txt, 12751.rebased.v32.txt, 12751.rebased.v32.txt, 
> 12751.rebased.v33.txt, 12751.rebased.v34.txt, 12751.rebased.v35.txt, 
> 12751.rebased.v35.txt, 12751.rebased.v35.txt, 12751.v37.txt, 12751.v38.txt, 
> 12751v22.txt, 12751v23.txt, 12751v23.txt, 12751v23.txt, 12751v23.txt, 
> 12751v36.txt, HBASE-12751-v1.patch, HBASE-12751-v10.patch, 
> HBASE-12751-v10.patch, HBASE-12751-v11.patch, HBASE-12751-v12.patch, 
> HBASE-12751-v13.patch, HBASE-12751-v14.patch, HBASE-12751-v15.patch, 
> HBASE-12751-v16.patch, HBASE-12751-v17.patch, HBASE-12751-v18.patch, 
> HBASE-12751-v19 (1).patch, HBASE-12751-v19.patch, HBASE-12751-v2.patch, 
> HBASE-12751-v20.patch, HBASE-12751-v20.patch, HBASE-12751-v21.patch, 
> HBASE-12751-v3.patch, HBASE-12751-v4.patch, HBASE-12751-v5.patch, 
> HBASE-12751-v6.patch, HBASE-12751-v7.patch, HBASE-12751-v8.patch, 
> HBASE-12751-v9.patch, HBASE-12751.patch
>
>
> Right now every write operation grabs a row lock. This is to prevent values 
> from changing during a read modify write operation (increment or check and 
> put). However it limits parallelism in several different scenarios.
> If there are several puts to the same row but different columns or stores 
> then this is very limiting.
> If there are puts to the same column then mvcc number should ensure a 
> consistent ordering. So locking is not needed.
> However locking for check and put or increment is still needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14278) Fix NPE that is showing up since HBASE-14274 went in

2015-09-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790710#comment-14790710
 ] 

stack commented on HBASE-14278:
---

kalashnikov:hbase.git.commit stack$ python dev-support/findHangingTests.py  
https://builds.apache.org/job/PreCommit-HBASE-Build/15617/consoleText
Fetching the console output from the URL
Printing hanging tests
Printing Failing tests
Failing test : org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
Failing test : org.apache.hadoop.hbase.client.TestReplicaWithCluster

TestReplicaWithCluster I see is showing up as a hang. I'll take a look. The 
other failure loooks unrelated. I'll look at that too.

+1 on patch. This emission is ugly currently spewing all over test runs. Thanks 
[~eclark]

On commit, shove e.getMessage on the end of this log just so we can be sure it 
that old faithful, the NPE: 

76  } catch (Exception e) {
77// Ignored. If this errors out it means that someone is double
78// closing the region source and the region is already nulled out.
79LOG.info("Error trying to remove " + toRemove + " from " + 
this.getClass().getSimpleName());
80  }


> Fix NPE that is showing up since HBASE-14274 went in
> 
>
> Key: HBASE-14278
> URL: https://issues.apache.org/jira/browse/HBASE-14278
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 2.0.0, 1.2.0, 1.3.0
>Reporter: stack
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14278-v1.patch, HBASE-14278-v2.patch, 
> HBASE-14278-v3.patch, HBASE-14278-v4.patch, HBASE-14278-v5.patch, 
> HBASE-14278.patch
>
>
> Saw this in TestDistributedLogSplitting after HBASE-14274 was applied.
> {code}
> 119113 2015-08-20 15:31:10,704 WARN  [HBase-Metrics2-1] 
> impl.MetricsConfig(124): Cannot locate configuration: tried 
> hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
> 119114 2015-08-20 15:31:10,710 ERROR [HBase-Metrics2-1] 
> lib.MethodMetric$2(118): Error invoking method getBlocksTotal
> 119115 java.lang.reflect.InvocationTargetException
> 119116 ›   at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source)
> 119117 ›   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 119118 ›   at java.lang.reflect.Method.invoke(Method.java:606)
> 119119 ›   at 
> org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111)
> 119120 ›   at 
> org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144)
> 119121 ›   at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:387)
> 119122 ›   at 
> org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79)
> 119123 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195)
> 119124 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172)
> 119125 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151)
> 119126 ›   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333)
> 119127 ›   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319)
> 119128 ›   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
> 119129 ›   at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57)
> 119130 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:221)
> 119131 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:96)
> 119132 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:245)
> 119133 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:229)
> 119134 ›   at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source)
> 119135 ›   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 119136 ›   at java.lang.reflect.Method.invoke(Method.java:606)
> 119137 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:290)
> 119138 ›   at com.sun.proxy.$Proxy13.postStart(Unknown Source)
> 119139 ›   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:185)
> 119140 ›   at 
> org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:81)
> 119141 ›   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 119142 ›   at java.util.concurrent.FutureTask.run

[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager

2015-09-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790698#comment-14790698
 ] 

stack commented on HBASE-10449:
---

bq. I expect that if we have more than coreSize calls in timeout (256 vs 60 
seconds in our case) then we always have coreSize threads.

Say again. I'm not following [~nkeywal]  Thanks.

bq. ...the protobuf nightmare if you remember 

Yes. Smile. Need to revive it for here and for doing client timeouts

> Wrong execution pool configuration in HConnectionManager
> 
>
> Key: HBASE-10449
> URL: https://issues.apache.org/jira/browse/HBASE-10449
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.98.0, 0.99.0, 0.96.1.1
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Critical
> Fix For: 0.98.0, 0.96.2, 0.99.0
>
> Attachments: HBASE-10449.v1.patch
>
>
> There is a confusion in the configuration of the pool. The attached patch 
> fixes this. This may change the client performances, as we were using a 
> single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14221) Reduce the number of time row comparison is done in a Scan

2015-09-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790692#comment-14790692
 ] 

stack commented on HBASE-14221:
---

bq. . But atleast for a single CF case I think these comparison can be reduced.

How does this extend to the MultiCF case?

So, about 10% difference for this added complexity?

@larsh You are probably interested in this.

Why need for two flags? Why not isSingleColumnFamily test not enough? When 
would we have a single store heap scanner but then a joined heap would have 
more than one?

5275// Indicates if the storeHeap is formed of only one StoreScanner
5276boolean singleStoreScannerHeap = false;
5277// Indicates if the joinedHeap is formed of only one StoreScanner.
5278boolean singleStoreScannerJoinedHeap = false;

Why add a flag here?

  boolean moreValues = populateResult(results, this.joinedHeap, 
scannerContext,
5488  joinedContinuationRow);   5497  
joinedContinuationRow, singleStoreScannerJoinedHeap);

Why not just have the flag be in the scanner context?


> Reduce the number of time row comparison is done in a Scan
> --
>
> Key: HBASE-14221
> URL: https://issues.apache.org/jira/browse/HBASE-14221
> Project: HBase
>  Issue Type: Sub-task
>  Components: Scanners
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: HBASE-14221.patch, HBASE-14221_1.patch, 
> HBASE-14221_1.patch, withmatchingRowspatch.png, withoutmatchingRowspatch.png
>
>
> When we tried to do some profiling with the PE tool found this.
> Currently we do row comparisons in 3 places in a simple Scan case.
> 1) ScanQueryMatcher
> {code}
>int ret = this.rowComparator.compareRows(curCell, cell);
> if (!this.isReversed) {
>   if (ret <= -1) {
> return MatchCode.DONE;
>   } else if (ret >= 1) {
> // could optimize this, if necessary?
> // Could also be called SEEK_TO_CURRENT_ROW, but this
> // should be rare/never happens.
> return MatchCode.SEEK_NEXT_ROW;
>   }
> } else {
>   if (ret <= -1) {
> return MatchCode.SEEK_NEXT_ROW;
>   } else if (ret >= 1) {
> return MatchCode.DONE;
>   }
> }
> {code}
> 2) In StoreScanner next() while starting to scan the row
> {code}
> if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || 
> matcher.curCell == null ||
> isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) {
>   this.countPerRow = 0;
>   matcher.setToNewRow(peeked);
> }
> {code}
> Particularly to see if we are in a new row.
> 3) In HRegion
> {code}
>   scannerContext.setKeepProgress(true);
>   heap.next(results, scannerContext);
>   scannerContext.setKeepProgress(tmpKeepProgress);
>   nextKv = heap.peek();
> moreCellsInRow = moreCellsInRow(nextKv, currentRowCell);
> {code}
> Here again there are cases where we need to careful for a MultiCF case.  Was 
> trying to solve this for the MultiCF case but is having lot of cases to 
> solve. But atleast for a single CF case I think these comparison can be 
> reduced.
> So for a single CF case in the SQM we are able to find if we have crossed a 
> row using the code pasted above in SQM. That comparison is definitely needed.
> Now in case of a single CF the HRegion is going to have only one element in 
> the heap and so the 3rd comparison can surely be avoided if the 
> StoreScanner.next() was over due to MatchCode.DONE caused by SQM.
> Coming to the 2nd compareRows that we do in StoreScanner. next() - even that 
> can be avoided if we know that the previous next() call was over due to a new 
> row. Doing all this I found that the compareRows in the profiler which was 
> 19% got reduced to 13%. Initially we can solve for single CF case which can 
> be extended to MultiCF cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14411) Fix unit test failures when using multiwal as default WAL provider

2015-09-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790684#comment-14790684
 ] 

Hudson commented on HBASE-14411:


FAILURE: Integrated in HBase-1.3 #178 (See 
[https://builds.apache.org/job/HBase-1.3/178/])
HBASE-14411 Fix unit test failures when using multiwal as default WAL provider 
(Yu Li) (tedyu: rev 0452ba09b53fb450c913811b77d74b6035b40ce3)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportExport.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/wal/TestWALSplit.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/wal/DefaultWALProvider.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java


> Fix unit test failures when using multiwal as default WAL provider
> --
>
> Key: HBASE-14411
> URL: https://issues.apache.org/jira/browse/HBASE-14411
> Project: HBase
>  Issue Type: Bug
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14411.branch-1.patch, HBASE-14411.patch, 
> HBASE-14411_v2.patch
>
>
> If we set hbase.wal.provider to multiwal in 
> hbase-server/src/test/resources/hbase-site.xml which allows us to use 
> BoundedRegionGroupingProvider in UT, we will observe below failures in 
> current code base:
> {noformat}
> Failed tests:
>   TestHLogRecordReader>TestWALRecordReader.testPartialRead:164 expected:<1> 
> but was:<2>
>   TestHLogRecordReader>TestWALRecordReader.testWALRecordReader:216 
> expected:<2> but was:<3>
>   TestWALRecordReader.testPartialRead:164 expected:<1> but was:<2>
>   TestWALRecordReader.testWALRecordReader:216 expected:<2> but was:<3>
>   TestDistributedLogSplitting.testRecoveredEdits:276 edits dir should have 
> more than a single file in it. instead has 1
>   TestAtomicOperation.testMultiRowMutationMultiThreads:499 expected:<0> but 
> was:<1>
>   TestHRegionServerBulkLoad.testAtomicBulkLoad:307
> Expected: is 
>  but: was 
>   TestLogRolling.testCompactionRecordDoesntBlockRolling:611 Should have WAL; 
> one table is not flushed expected:<1> but was:<0>
>   TestLogRolling.testLogRollOnDatanodeDeath:359 null
>   TestLogRolling.testLogRollOnPipelineRestart:472 Missing datanode should've 
> triggered a log roll
>   TestReplicationSourceManager.testLogRoll:237 expected:<6> but was:<7>
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestReplicationWALReaderManager.test:155 null
>   TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594 if 
> skip.errors is false all files should remain in place expected:<11> but 
> was:<12>
>   TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong number of files in the 
> archive log expected:<11> but was:<12>
>   TestWALSplit.testMovedWALDuringRecovery:810->retryOverHdfsProblem:793 
> expected:<11> but was:<12>
>   TestWALSplit.testRetryOpenDuringRecovery:838->retryOverHdfsProblem:793 
> expected:<11> but was:<12>
>   
> TestWALSplitCompressed>TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594
>  if skip.errors is false all files should remain in place expected:<11> but 
> was:<12>
>   TestWALSplitCompressed>TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong 
> number of files in the archive log expected:<11> but was:<12>
>   
> TestWALSplitCompressed>TestWALSplit.testMovedWALDuringRecovery:810->TestWALSplit.retryOverHdfsProblem:793
>  expected:<11> but was:<12>
>   
> TestWALSplitCompressed>TestWALSplit.testRetryOpenDuringRecovery:838->TestWALSplit.retryOverHdfsProblem:793
>  expected:<11> but was:<12>
> {noformat}
> While patch for HBASE-14306 could resolve failures of TestHLogRecordReader, 
> TestReplicationSourceManager and TestReplicationWALReaderManager, this JIRA 
> will focus on resolving the others



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager

2015-09-16 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790660#comment-14790660
 ] 

Nicolas Liochon commented on HBASE-10449:
-

> I was thinking that we'd go to core size – say # of cores – and then if one 
> request a second, we'd just stay at core size because there would be a free 
> thread when the request-per-second came in (assuming request took a good deal 
> < a second).

I expect that if we have more than coreSize calls in timeout (256 vs 60 seconds 
in our case) then we always have coreSize threads.

> Didn't we have a mock server somewhere such that we could standup a client 
> with no friction and watch it in operation? I thought we'd make such a 
> beast
Yep, you built one, we used it when we looked at the perf issues in the client 
(the protobuf nightmare if you remember ;:-)). 


> Wrong execution pool configuration in HConnectionManager
> 
>
> Key: HBASE-10449
> URL: https://issues.apache.org/jira/browse/HBASE-10449
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.98.0, 0.99.0, 0.96.1.1
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Critical
> Fix For: 0.98.0, 0.96.2, 0.99.0
>
> Attachments: HBASE-10449.v1.patch
>
>
> There is a confusion in the configuration of the pool. The attached patch 
> fixes this. This may change the client performances, as we were using a 
> single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14433) Set down the client executor core thread count from 256 in tests

2015-09-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14433:
--
Release Note: Tests run with client executors that have core thread count 
of 4 and a keepalive of 3 seconds. They used to default to 256 core threads and 
60 seconds  for keepalive.  (was: Change the client executor core thread count 
to be number of processors instead of 256: i.e. the equivalent of the maximum 
threads allowed on client. The config to set it back to 256 or any other value 
is "hbase.hconnection.threads.core".

Also set it so core is set to default 4 threads in client core in tests (and 
keepalive is downed from a minute to 3 seconds).)
 Summary: Set down the client executor core thread count from 256 in 
tests  (was: Set down the client executor core thread count from 256 to number 
of processors)

> Set down the client executor core thread count from 256 in tests
> 
>
> Key: HBASE-14433
> URL: https://issues.apache.org/jira/browse/HBASE-14433
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0
>
> Attachments: 14433 (1).txt, 14433.txt, 14433v2.txt, 14433v3.txt, 
> 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt
>
>
> HBASE-10449 upped our core count from 0 to 256 (max is 256). Looking in a 
> recent test run core dump, I see up to 256 threads per client and all are 
> idle. At a minimum it makes it hard reading test thread dumps. Trying to 
> learn more about why we went a core of 256 over in HBASE-10449. Meantime will 
> try setting down configs for test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13770) Programmatic JAAS configuration option for secure zookeeper may be broken

2015-09-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790641#comment-14790641
 ] 

Hadoop QA commented on HBASE-13770:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12756251/HBASE-13770-0.98.patch
  against 0.98 branch at commit d2e338181800ae3cef55ddca491901b65259dc7f.
  ATTACHMENT ID: 12756251

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
23 warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
3873 checkstyle errors (more than the master's current 3869 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+  public static final String ZK_CLIENT_KERBEROS_PRINCIPLE = 
"hbase.zookeeper.client.kerberos.principal";
+  public static final String ZK_SERVER_KERBEROS_PRINCIPLE = 
"hbase.zookeeper.server.kerberos.principal";

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15623//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15623//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15623//artifact/patchprocess/checkstyle-aggregate.html

Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15623//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15623//console

This message is automatically generated.

> Programmatic JAAS configuration option for secure zookeeper may be broken
> -
>
> Key: HBASE-13770
> URL: https://issues.apache.org/jira/browse/HBASE-13770
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.0.1, 1.1.0, 0.98.13, 1.2.0
>Reporter: Andrew Purtell
>Assignee: Maddineni Sukumar
> Fix For: 0.98.13
>
> Attachments: HBASE-13770-0.98.patch, HBASE-13770-v1.patch, 
> HBASE-13770-v2.patch
>
>
> While verifying the patch fix for HBASE-13768 we were unable to successfully 
> test the programmatic JAAS configuration option for secure ZooKeeper 
> integration. Unclear if that was due to a bug or incorrect test configuration.
> Update the security section of the online book with clear instructions for 
> setting up the programmatic JAAS configuration option for secure ZooKeeper 
> integration.
> Verify it works.
> Fix as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14433) Set down the client executor core thread count from 256 in tests

2015-09-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14433:
--
Fix Version/s: 1.3.0
   1.2.0

> Set down the client executor core thread count from 256 in tests
> 
>
> Key: HBASE-14433
> URL: https://issues.apache.org/jira/browse/HBASE-14433
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14433 (1).txt, 14433.txt, 14433v2.txt, 14433v3.txt, 
> 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 
> 14433v4.reapply.txt
>
>
> HBASE-10449 upped our core count from 0 to 256 (max is 256). Looking in a 
> recent test run core dump, I see up to 256 threads per client and all are 
> idle. At a minimum it makes it hard reading test thread dumps. Trying to 
> learn more about why we went a core of 256 over in HBASE-10449. Meantime will 
> try setting down configs for test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14433) Set down the client executor core thread count from 256 in tests

2015-09-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14433:
--
Attachment: 14433v4.reapply.txt

Here is what I reapplied under the rubric of this issue. It just changes the 
config for tests. I applied to 1.2+.

> Set down the client executor core thread count from 256 in tests
> 
>
> Key: HBASE-14433
> URL: https://issues.apache.org/jira/browse/HBASE-14433
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14433 (1).txt, 14433.txt, 14433v2.txt, 14433v3.txt, 
> 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 
> 14433v4.reapply.txt
>
>
> HBASE-10449 upped our core count from 0 to 256 (max is 256). Looking in a 
> recent test run core dump, I see up to 256 threads per client and all are 
> idle. At a minimum it makes it hard reading test thread dumps. Trying to 
> learn more about why we went a core of 256 over in HBASE-10449. Meantime will 
> try setting down configs for test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14433) Set down the client executor core thread count from 256 to number of processors

2015-09-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790588#comment-14790588
 ] 

stack commented on HBASE-14433:
---

Ok. Reverting the patch I applied last night because discussion ongoing over in 
HBASE-10449. I'm instead going to just set limits for tests only.

> Set down the client executor core thread count from 256 to number of 
> processors
> ---
>
> Key: HBASE-14433
> URL: https://issues.apache.org/jira/browse/HBASE-14433
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0
>
> Attachments: 14433 (1).txt, 14433.txt, 14433v2.txt, 14433v3.txt, 
> 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt
>
>
> HBASE-10449 upped our core count from 0 to 256 (max is 256). Looking in a 
> recent test run core dump, I see up to 256 threads per client and all are 
> idle. At a minimum it makes it hard reading test thread dumps. Trying to 
> learn more about why we went a core of 256 over in HBASE-10449. Meantime will 
> try setting down configs for test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager

2015-09-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790587#comment-14790587
 ] 

stack commented on HBASE-10449:
---

Thanks [~nkeywal]

bq. We should not see 256 threads, because they should expire already

Maybe they spin up inside the keepalive time of 60 seconds.

bq. We will still have 60 threads, because each new request will create a new 
thread until we reach coreSize

Well, I was thinking that we'd go to core size -- say # of cores -- and then if 
one request a second, we'd just stay at core size because there would be a free 
thread when the request-per-second came in (assuming request took a good deal < 
a second).

Let me look at HBASE-11590.

What I saw was each client with hundreds -- up to 256 on one -- threads all in 
WAITING like follows:

{code}
"hconnection-0x3065a6a9-shared--pool13-t247" daemon prio=10 
tid=0x7f31c1ab2000 nid=0x7718 waiting on condition [0x7f2f9ecec000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0007f841b388> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at 
java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
{code}

... usually in TestReplicasClient.  Here is example: 
https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/15581/consoleText
  See zombies on the end.

I also have second thoughts on HBASE-114433. I am going to change it so we set 
config for tests only. We need to do more work before can set the core threads 
down from max is what I am thinking.

Thanks [~nkeywal] I'll look at HBASE-11590.

Didn't we have a mock server somewhere such that we could standup a client with 
no friction and watch it in operation? I thought we'd make such a beast

> Wrong execution pool configuration in HConnectionManager
> 
>
> Key: HBASE-10449
> URL: https://issues.apache.org/jira/browse/HBASE-10449
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.98.0, 0.99.0, 0.96.1.1
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Critical
> Fix For: 0.98.0, 0.96.2, 0.99.0
>
> Attachments: HBASE-10449.v1.patch
>
>
> There is a confusion in the configuration of the pool. The attached patch 
> fixes this. This may change the client performances, as we were using a 
> single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14443) Add request parameter to the TooSlow/TooLarge warn message of RpcServer

2015-09-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790528#comment-14790528
 ] 

stack commented on HBASE-14443:
---

Anything to make this stuff more useful is welcome (+1 on transform)

> Add request parameter to the TooSlow/TooLarge warn message of RpcServer
> ---
>
> Key: HBASE-14443
> URL: https://issues.apache.org/jira/browse/HBASE-14443
> Project: HBase
>  Issue Type: Improvement
>  Components: rpc
>Reporter: Jianwei Cui
>Priority: Minor
> Fix For: 1.2.1
>
>
> The RpcServer will log a warn message for TooSlow or TooLarge request as:
> {code}
> logResponse(new Object[]{param},
> md.getName(), md.getName() + "(" + param.getClass().getName() + 
> ")",
> (tooLarge ? "TooLarge" : "TooSlow"),
> status.getClient(), startTime, processingTime, qTime,
> responseSize);
> {code}
> The RpcServer#logResponse will create the warn message as:
> {code}
> if (params.length == 2 && server instanceof HRegionServer &&
> params[0] instanceof byte[] &&
> params[1] instanceof Operation) {
>   ...
>   responseInfo.putAll(((Operation) params[1]).toMap());
>   ...
> } else if (params.length == 1 && server instanceof HRegionServer &&
> params[0] instanceof Operation) {
>   ...
>   responseInfo.putAll(((Operation) params[0]).toMap());
>   ...
> } else {
>   ...
> }
> {code}
> Because the parameter is always a protobuf message, not an instance of 
> Operation, the request parameter will not be added into the warn message. The 
> parameter is helpful to find out the problem, for example, knowing the 
> startRow/endRow is useful for a TooSlow scan. To improve the warn message, we 
> can transform the protobuf request message to corresponding Operation 
> subclass object by ProtobufUtil, so that it can be added the warn message. 
> Suggestion and discussion are welcomed.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails

2015-09-16 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790514#comment-14790514
 ] 

Ted Yu commented on HBASE-14431:


lgtm

nit: connection.hashCode() is computed twice. You can save the return value in 
a local variable.

> AsyncRpcClient#removeConnection() never removes connection from connections 
> pool if server fails
> 
>
> Key: HBASE-14431
> URL: https://issues.apache.org/jira/browse/HBASE-14431
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 2.0.0, 1.0.2, 1.1.2
>Reporter: Samir Ahmic
>Assignee: Samir Ahmic
>Priority: Critical
> Attachments: HBASE-14431.patch
>
>
> I was playing with master branch in distributed mode (3 rs + master + 
> backup_master) and notice strange behavior when i was testing this sequence 
> of events on single rs: /kill/start/run_balancer while client was writing 
> data to cluster (LoadTestTool).
> I have notice that LTT fails with following:
> {code}
> 2015-09-09 11:05:58,364 INFO  [main] client.AsyncProcess: #2, waiting for 
> some tasks to finish. Expected max=0, tasksInProgress=35
> Exception in thread "main" 
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 
> action: BindException: 1 time, 
>   at 
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228)
>   at 
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208)
>   at 
> org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697)
>   at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211)
> {code}
> After some digging  and adding some more logging in code i have notice that 
> following condition in  {code}AsyncRpcClient.removeConnection(AsyncRpcChannel 
> connection) {code} is never true:
> {code}
> if (connectionInPool == connection) {
> {code} 
> causing that  {code}AsyncRpcChannel{code} connection is never removed from 
> {code}connections{code} pool in case rs fails.
> After changing this condition to:
> {code}
> if (connectionInPool.address.equals(connection.address)) {
> {code}
> issue was resolved and client was removing failed server from connections 
> pool.
> I will attach patch after running some more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails

2015-09-16 Thread Samir Ahmic (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samir Ahmic updated HBASE-14431:

Attachment: HBASE-14431.patch

Here is patch fixing this issue. I have notice that we have some 50s pause in 
client between detecting that session has been reset (killing rs) and removing 
connection to this server from connections pool. I will probably open new 
ticket  addressing this issue when i dig more info why this pause is so long  

> AsyncRpcClient#removeConnection() never removes connection from connections 
> pool if server fails
> 
>
> Key: HBASE-14431
> URL: https://issues.apache.org/jira/browse/HBASE-14431
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 2.0.0, 1.0.2, 1.1.2
>Reporter: Samir Ahmic
>Assignee: Samir Ahmic
>Priority: Critical
> Attachments: HBASE-14431.patch
>
>
> I was playing with master branch in distributed mode (3 rs + master + 
> backup_master) and notice strange behavior when i was testing this sequence 
> of events on single rs: /kill/start/run_balancer while client was writing 
> data to cluster (LoadTestTool).
> I have notice that LTT fails with following:
> {code}
> 2015-09-09 11:05:58,364 INFO  [main] client.AsyncProcess: #2, waiting for 
> some tasks to finish. Expected max=0, tasksInProgress=35
> Exception in thread "main" 
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 
> action: BindException: 1 time, 
>   at 
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228)
>   at 
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208)
>   at 
> org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697)
>   at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211)
> {code}
> After some digging  and adding some more logging in code i have notice that 
> following condition in  {code}AsyncRpcClient.removeConnection(AsyncRpcChannel 
> connection) {code} is never true:
> {code}
> if (connectionInPool == connection) {
> {code} 
> causing that  {code}AsyncRpcChannel{code} connection is never removed from 
> {code}connections{code} pool in case rs fails.
> After changing this condition to:
> {code}
> if (connectionInPool.address.equals(connection.address)) {
> {code}
> issue was resolved and client was removing failed server from connections 
> pool.
> I will attach patch after running some more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >