[jira] [Created] (HBASE-14448) Refine RegionGroupingProvider Phase-2: remove provider nesting and formalize wal group name
Yu Li created HBASE-14448: - Summary: Refine RegionGroupingProvider Phase-2: remove provider nesting and formalize wal group name Key: HBASE-14448 URL: https://issues.apache.org/jira/browse/HBASE-14448 Project: HBase Issue Type: Improvement Reporter: Yu Li Assignee: Yu Li Now we are nesting DefaultWALProvider inside RegionGroupingProvider, which makes the logic ambiguous since a "provider" itself should provide logs. Suggest to directly instantiate FSHlog in RegionGroupingProvider. W.r.t wal group name, now in RegionGroupingProvider it's using sth like "-null-" which is quite long and unnecessary. Suggest to directly use ".". For more details, please refer to the initial patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14411) Fix unit test failures when using multiwal as default WAL provider
[ https://issues.apache.org/jira/browse/HBASE-14411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791650#comment-14791650 ] Yu Li commented on HBASE-14411: --- >From the [testReport | >https://builds.apache.org/job/PreCommit-HBASE-Build/15614//testReport/org.apache.hadoop.hbase.regionserver/TestWALLockup/testLockupWhenSyncInMiddleOfZigZagSetup/], > failure of the case should be caused by intermittent env issue, below is the >exception thrown in TestWALLockup: {noformat} Caused by: java.io.IOException: FAKE! Failed to replace a bad datanode...APPEND at org.apache.hadoop.hbase.regionserver.TestWALLockup$1DodgyFSLog$1.append(TestWALLockup.java:173) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.append(FSHLog.java:1880) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1748) {noformat} Thanks [~eclark] for the attention, and [~tedyu] for help taking a look. > Fix unit test failures when using multiwal as default WAL provider > -- > > Key: HBASE-14411 > URL: https://issues.apache.org/jira/browse/HBASE-14411 > Project: HBase > Issue Type: Bug >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14411.branch-1.patch, HBASE-14411.patch, > HBASE-14411_v2.patch > > > If we set hbase.wal.provider to multiwal in > hbase-server/src/test/resources/hbase-site.xml which allows us to use > BoundedRegionGroupingProvider in UT, we will observe below failures in > current code base: > {noformat} > Failed tests: > TestHLogRecordReader>TestWALRecordReader.testPartialRead:164 expected:<1> > but was:<2> > TestHLogRecordReader>TestWALRecordReader.testWALRecordReader:216 > expected:<2> but was:<3> > TestWALRecordReader.testPartialRead:164 expected:<1> but was:<2> > TestWALRecordReader.testWALRecordReader:216 expected:<2> but was:<3> > TestDistributedLogSplitting.testRecoveredEdits:276 edits dir should have > more than a single file in it. instead has 1 > TestAtomicOperation.testMultiRowMutationMultiThreads:499 expected:<0> but > was:<1> > TestHRegionServerBulkLoad.testAtomicBulkLoad:307 > Expected: is > but: was > TestLogRolling.testCompactionRecordDoesntBlockRolling:611 Should have WAL; > one table is not flushed expected:<1> but was:<0> > TestLogRolling.testLogRollOnDatanodeDeath:359 null > TestLogRolling.testLogRollOnPipelineRestart:472 Missing datanode should've > triggered a log roll > TestReplicationSourceManager.testLogRoll:237 expected:<6> but was:<7> > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594 if > skip.errors is false all files should remain in place expected:<11> but > was:<12> > TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong number of files in the > archive log expected:<11> but was:<12> > TestWALSplit.testMovedWALDuringRecovery:810->retryOverHdfsProblem:793 > expected:<11> but was:<12> > TestWALSplit.testRetryOpenDuringRecovery:838->retryOverHdfsProblem:793 > expected:<11> but was:<12> > > TestWALSplitCompressed>TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594 > if skip.errors is false all files should remain in place expected:<11> but > was:<12> > TestWALSplitCompressed>TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong > number of files in the archive log expected:<11> but was:<12> > > TestWALSplitCompressed>TestWALSplit.testMovedWALDuringRecovery:810->TestWALSplit.retryOverHdfsProblem:793 > expected:<11> but was:<12> > > TestWALSplitCompressed>TestWALSplit.testRetryOpenDuringRecovery:838->TestWALSplit.retryOverHdfsProblem:793 > expected:<11> but was:<12> > {noformat} > While patch for HBASE-14306 could resolve failures of TestHLogRecordReader, > TestReplicationSourceManager and TestReplicationWALReaderManager, this JIRA > will focus on resolving the others -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791641#comment-14791641 ] Hudson commented on HBASE-14082: SUCCESS: Integrated in HBase-1.2-IT #153 (See [https://builds.apache.org/job/HBase-1.2-IT/153/]) HBASE-14082 Add replica id to JMX metrics names (Lei Chen) (enis: rev 9f420d0ac6175a7245efe68c27fc32458bca1b86) * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionWrapperImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java * hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionWrapper.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/MetricsRegionWrapperStub.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegion.java * hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSource.java > Add replica id to JMX metrics names > --- > > Key: HBASE-14082 > URL: https://issues.apache.org/jira/browse/HBASE-14082 > Project: HBase > Issue Type: Improvement > Components: metrics >Reporter: Lei Chen >Assignee: Lei Chen > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14082-v6.patch, HBASE-14082-v1.patch, > HBASE-14082-v2.patch, HBASE-14082-v3.patch, HBASE-14082-v4.patch, > HBASE-14082-v5.patch > > > Today, via JMX, one cannot distinguish a primary region from a replica. A > possible solution is to add replica id to JMX metrics names. The benefits may > include, for example: > # Knowing the latency of a read request on a replica region means the first > attempt to the primary region has timeout. > # Write requests on replicas are due to the replication process, while the > ones on primary are from clients. > # In case of looking for hot spots of read operations, replicas should be > excluded since TIMELINE reads are sent to all replicas. > To implement, we can change the format of metrics names found at > {code}Hadoop->HBase->RegionServer->Regions->Attributes{code} > from > {code}namespace__table__region__metric_{code} > to > {code}namespace__table__region__replicaid__metric_{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14278) Fix NPE that is showing up since HBASE-14274 went in
[ https://issues.apache.org/jira/browse/HBASE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791628#comment-14791628 ] Hudson commented on HBASE-14278: FAILURE: Integrated in HBase-TRUNK #6817 (See [https://builds.apache.org/job/HBase-TRUNK/6817/]) HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev c1ac4bb8601f88eb3fe246eb62c3f40e95faf93d) * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java > Fix NPE that is showing up since HBASE-14274 went in > > > Key: HBASE-14278 > URL: https://issues.apache.org/jira/browse/HBASE-14278 > Project: HBase > Issue Type: Sub-task > Components: test >Affects Versions: 2.0.0, 1.2.0, 1.3.0 >Reporter: stack >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14278-v1.patch, HBASE-14278-v2.patch, > HBASE-14278-v3.patch, HBASE-14278-v4.patch, HBASE-14278-v5.patch, > HBASE-14278.patch > > > Saw this in TestDistributedLogSplitting after HBASE-14274 was applied. > {code} > 119113 2015-08-20 15:31:10,704 WARN [HBase-Metrics2-1] > impl.MetricsConfig(124): Cannot locate configuration: tried > hadoop-metrics2-hbase.properties,hadoop-metrics2.properties > 119114 2015-08-20 15:31:10,710 ERROR [HBase-Metrics2-1] > lib.MethodMetric$2(118): Error invoking method getBlocksTotal > 119115 java.lang.reflect.InvocationTargetException > 119116 › at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source) > 119117 › at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 119118 › at java.lang.reflect.Method.invoke(Method.java:606) > 119119 › at > org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111) > 119120 › at > org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144) > 119121 › at > org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:387) > 119122 › at > org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79) > 119123 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) > 119124 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) > 119125 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) > 119126 › at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) > 119127 › at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) > 119128 › at > com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) > 119129 › at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57) > 119130 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:221) > 119131 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:96) > 119132 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:245) > 119133 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:229) > 119134 › at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source) > 119135 › at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 119136 › at java.lang.reflect.Method.invoke(Method.java:606) > 119137 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:290) > 119138 › at com.sun.proxy.$Proxy13.postStart(Unknown Source) > 119139 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:185) > 119140 › at > org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:81) > 119141 › at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > 119142 › at java.util.concurrent.FutureTask.run(FutureTask.java:262) > 119143 › at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) > 119144 › at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.j
[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791629#comment-14791629 ] Hudson commented on HBASE-14082: FAILURE: Integrated in HBase-TRUNK #6817 (See [https://builds.apache.org/job/HBase-TRUNK/6817/]) HBASE-14082 Add replica id to JMX metrics names (Lei Chen) (enis: rev 17bdf9fa8cbe920578c09c38960dd0450746fe5c) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionWrapperImpl.java * hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionWrapper.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegion.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java * hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSource.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/MetricsRegionWrapperStub.java > Add replica id to JMX metrics names > --- > > Key: HBASE-14082 > URL: https://issues.apache.org/jira/browse/HBASE-14082 > Project: HBase > Issue Type: Improvement > Components: metrics >Reporter: Lei Chen >Assignee: Lei Chen > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14082-v6.patch, HBASE-14082-v1.patch, > HBASE-14082-v2.patch, HBASE-14082-v3.patch, HBASE-14082-v4.patch, > HBASE-14082-v5.patch > > > Today, via JMX, one cannot distinguish a primary region from a replica. A > possible solution is to add replica id to JMX metrics names. The benefits may > include, for example: > # Knowing the latency of a read request on a replica region means the first > attempt to the primary region has timeout. > # Write requests on replicas are due to the replication process, while the > ones on primary are from clients. > # In case of looking for hot spots of read operations, replicas should be > excluded since TIMELINE reads are sent to all replicas. > To implement, we can change the format of metrics names found at > {code}Hadoop->HBase->RegionServer->Regions->Attributes{code} > from > {code}namespace__table__region__metric_{code} > to > {code}namespace__table__region__replicaid__metric_{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791630#comment-14791630 ] Hudson commented on HBASE-14274: FAILURE: Integrated in HBase-TRUNK #6817 (See [https://builds.apache.org/job/HBase-TRUNK/6817/]) HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev c1ac4bb8601f88eb3fe246eb62c3f40e95faf93d) * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java > Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs > MetricsRegionAggregateSourceImpl > --- > > Key: HBASE-14274 > URL: https://issues.apache.org/jira/browse/HBASE-14274 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, > HBASE-14274.patch > > > Looking into parent issue, got a hang locally of TestDistributedLogReplay. > We have region closes here: > {code} > "RS_CLOSE_META-localhost:59610-0" prio=5 tid=0x7ff65c03f800 nid=0x54347 > waiting on condition [0x00011f7ac000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00075636d8c0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) > at > org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) > at > org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) > - locked <0x0007ff878190> (a java.lang.Object) > at > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {code} > They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to > get a write lock on this classes local ReentrantReadWriteLock while holding > MetricsRegionSourceImpl's readWriteLock write lock. > Then, elsewhere the JmxCacheBuster is running trying to get metrics with > above locks held in reverse: > {code} > "HBase-Metrics2-1" daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting > on condition [0x000140ea5000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007cade1480> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) > at > org.apache.hadoop.hb
[jira] [Commented] (HBASE-13250) chown of ExportSnapshot does not cover all path and files
[ https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791615#comment-14791615 ] Hudson commented on HBASE-13250: FAILURE: Integrated in HBase-0.98 #1125 (See [https://builds.apache.org/job/HBase-0.98/1125/]) HBASE-13250 Revert due to compilation error against hadoop-1 profile (tedyu: rev 38995fbd51ac4735b673dd1527cb2631b69b7474) * hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java HBASE-13250 chown of ExportSnapshot does not cover all path and files (He Liangliang) (tedyu: rev 88a620892883ac878bde3ea3c64c7275600b7085) * hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java > chown of ExportSnapshot does not cover all path and files > - > > Key: HBASE-13250 > URL: https://issues.apache.org/jira/browse/HBASE-13250 > Project: HBase > Issue Type: Bug >Reporter: He Liangliang >Assignee: He Liangliang >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: 13250-0.98-v2.txt, HBASE-13250-V0.patch > > > The chuser/chgroup function only covers the leaf hfile. The ownership of > hfile parent paths and snapshot reference files are not changed as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12751) Allow RowLock to be reader writer
[ https://issues.apache.org/jira/browse/HBASE-12751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791603#comment-14791603 ] stack commented on HBASE-12751: --- Dang. The hangs are legit and reproducible. Will be back after try and figure the why. > Allow RowLock to be reader writer > - > > Key: HBASE-12751 > URL: https://issues.apache.org/jira/browse/HBASE-12751 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0, 1.3.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.3.0 > > Attachments: 12751.rebased.v25.txt, 12751.rebased.v26.txt, > 12751.rebased.v26.txt, 12751.rebased.v27.txt, 12751.rebased.v29.txt, > 12751.rebased.v31.txt, 12751.rebased.v32.txt, 12751.rebased.v32.txt, > 12751.rebased.v33.txt, 12751.rebased.v34.txt, 12751.rebased.v35.txt, > 12751.rebased.v35.txt, 12751.rebased.v35.txt, 12751.v37.txt, 12751.v38.txt, > 12751v22.txt, 12751v23.txt, 12751v23.txt, 12751v23.txt, 12751v23.txt, > 12751v36.txt, HBASE-12751-v1.patch, HBASE-12751-v10.patch, > HBASE-12751-v10.patch, HBASE-12751-v11.patch, HBASE-12751-v12.patch, > HBASE-12751-v13.patch, HBASE-12751-v14.patch, HBASE-12751-v15.patch, > HBASE-12751-v16.patch, HBASE-12751-v17.patch, HBASE-12751-v18.patch, > HBASE-12751-v19 (1).patch, HBASE-12751-v19.patch, HBASE-12751-v2.patch, > HBASE-12751-v20.patch, HBASE-12751-v20.patch, HBASE-12751-v21.patch, > HBASE-12751-v3.patch, HBASE-12751-v4.patch, HBASE-12751-v5.patch, > HBASE-12751-v6.patch, HBASE-12751-v7.patch, HBASE-12751-v8.patch, > HBASE-12751-v9.patch, HBASE-12751.patch > > > Right now every write operation grabs a row lock. This is to prevent values > from changing during a read modify write operation (increment or check and > put). However it limits parallelism in several different scenarios. > If there are several puts to the same row but different columns or stores > then this is very limiting. > If there are puts to the same column then mvcc number should ensure a > consistent ordering. So locking is not needed. > However locking for check and put or increment is still needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11590) use a specific ThreadPoolExecutor
[ https://issues.apache.org/jira/browse/HBASE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791601#comment-14791601 ] stack commented on HBASE-11590: --- Should we down the keepalive timeout so it is seconds only? We have allowCoreThreadTimeOut(true); Core threads would run up to the max but could also go down to zero as is noted in http://stackoverflow.com/questions/19528304/how-to-get-the-threadpoolexecutor-to-increase-threads-to-max-before-queueing/19528305#19528305 Or the suggestion by Ralph H at answered Oct 23 '13 at 10:15 in the link looks simple (after executing the current reset the core thread size if not enough for current requests). There is a new answer on the end... with a GPL soln. > use a specific ThreadPoolExecutor > - > > Key: HBASE-11590 > URL: https://issues.apache.org/jira/browse/HBASE-11590 > Project: HBase > Issue Type: Bug > Components: Client, Performance >Affects Versions: 1.0.0, 2.0.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Minor > Fix For: 2.0.0 > > Attachments: tp.patch > > > The JDK TPE creates all the threads in the pool. As a consequence, we create > (by default) 256 threads even if we just need a few. > The attached TPE create threads only if we have something in the queue. > On a PE test with replica on, it improved the 99 latency percentile by 5%. > Warning: there are likely some race conditions, but I'm posting it here > because there is may be an implementation available somewhere we can use, or > a good reason not to do that. So feedback welcome as usual. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14404) Backport HBASE-14098 (Allow dropping caches behind compactions) to 0.98
[ https://issues.apache.org/jira/browse/HBASE-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791588#comment-14791588 ] Lars Hofhansl commented on HBASE-14404: --- I think it's fine either way. +1 on backport. > Backport HBASE-14098 (Allow dropping caches behind compactions) to 0.98 > --- > > Key: HBASE-14404 > URL: https://issues.apache.org/jira/browse/HBASE-14404 > Project: HBase > Issue Type: Task >Reporter: Andrew Purtell >Assignee: Andrew Purtell > Fix For: 0.98.15 > > Attachments: HBASE-14404-0.98.patch > > > HBASE-14098 adds a new configuration toggle - > "hbase.hfile.drop.behind.compaction" - which if set to "true" tells > compactions to drop pages from the OS blockcache after write. It's on by > default where committed so far but a backport to 0.98 would default it to > off. (The backport would also retain compat methods to LimitedPrivate > interface StoreFileScanner.) What could make it a controversial change in > 0.98 is it changes the default setting of > 'hbase.regionserver.compaction.private.readers' from "false" to "true". I > think it's fine, we use private readers in production. They're stable and do > not present perf issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14404) Backport HBASE-14098 (Allow dropping caches behind compactions) to 0.98
[ https://issues.apache.org/jira/browse/HBASE-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791577#comment-14791577 ] Andrew Purtell commented on HBASE-14404: ... which seems fine, but we could change the backport to pick up the default setting from the hadoop config if the HBase configuration doesn't specify one way or the other. > Backport HBASE-14098 (Allow dropping caches behind compactions) to 0.98 > --- > > Key: HBASE-14404 > URL: https://issues.apache.org/jira/browse/HBASE-14404 > Project: HBase > Issue Type: Task >Reporter: Andrew Purtell >Assignee: Andrew Purtell > Fix For: 0.98.15 > > Attachments: HBASE-14404-0.98.patch > > > HBASE-14098 adds a new configuration toggle - > "hbase.hfile.drop.behind.compaction" - which if set to "true" tells > compactions to drop pages from the OS blockcache after write. It's on by > default where committed so far but a backport to 0.98 would default it to > off. (The backport would also retain compat methods to LimitedPrivate > interface StoreFileScanner.) What could make it a controversial change in > 0.98 is it changes the default setting of > 'hbase.regionserver.compaction.private.readers' from "false" to "true". I > think it's fine, we use private readers in production. They're stable and do > not present perf issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14404) Backport HBASE-14098 (Allow dropping caches behind compactions) to 0.98
[ https://issues.apache.org/jira/browse/HBASE-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791575#comment-14791575 ] Andrew Purtell commented on HBASE-14404: HBase is changing the setting for the embedded DFSclient in HBase, > Backport HBASE-14098 (Allow dropping caches behind compactions) to 0.98 > --- > > Key: HBASE-14404 > URL: https://issues.apache.org/jira/browse/HBASE-14404 > Project: HBase > Issue Type: Task >Reporter: Andrew Purtell >Assignee: Andrew Purtell > Fix For: 0.98.15 > > Attachments: HBASE-14404-0.98.patch > > > HBASE-14098 adds a new configuration toggle - > "hbase.hfile.drop.behind.compaction" - which if set to "true" tells > compactions to drop pages from the OS blockcache after write. It's on by > default where committed so far but a backport to 0.98 would default it to > off. (The backport would also retain compat methods to LimitedPrivate > interface StoreFileScanner.) What could make it a controversial change in > 0.98 is it changes the default setting of > 'hbase.regionserver.compaction.private.readers' from "false" to "true". I > think it's fine, we use private readers in production. They're stable and do > not present perf issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791565#comment-14791565 ] Lei Chen commented on HBASE-14082: -- Thank you all for helping me all the way. > Add replica id to JMX metrics names > --- > > Key: HBASE-14082 > URL: https://issues.apache.org/jira/browse/HBASE-14082 > Project: HBase > Issue Type: Improvement > Components: metrics >Reporter: Lei Chen >Assignee: Lei Chen > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14082-v6.patch, HBASE-14082-v1.patch, > HBASE-14082-v2.patch, HBASE-14082-v3.patch, HBASE-14082-v4.patch, > HBASE-14082-v5.patch > > > Today, via JMX, one cannot distinguish a primary region from a replica. A > possible solution is to add replica id to JMX metrics names. The benefits may > include, for example: > # Knowing the latency of a read request on a replica region means the first > attempt to the primary region has timeout. > # Write requests on replicas are due to the replication process, while the > ones on primary are from clients. > # In case of looking for hot spots of read operations, replicas should be > excluded since TIMELINE reads are sent to all replicas. > To implement, we can change the format of metrics names found at > {code}Hadoop->HBase->RegionServer->Regions->Attributes{code} > from > {code}namespace__table__region__metric_{code} > to > {code}namespace__table__region__replicaid__metric_{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791563#comment-14791563 ] Hudson commented on HBASE-14274: FAILURE: Integrated in HBase-1.2 #180 (See [https://builds.apache.org/job/HBase-1.2/180/]) HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev a229ac91fbab2608ae89bbe44b1dd05e5aef1183) * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java > Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs > MetricsRegionAggregateSourceImpl > --- > > Key: HBASE-14274 > URL: https://issues.apache.org/jira/browse/HBASE-14274 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, > HBASE-14274.patch > > > Looking into parent issue, got a hang locally of TestDistributedLogReplay. > We have region closes here: > {code} > "RS_CLOSE_META-localhost:59610-0" prio=5 tid=0x7ff65c03f800 nid=0x54347 > waiting on condition [0x00011f7ac000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00075636d8c0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) > at > org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) > at > org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) > - locked <0x0007ff878190> (a java.lang.Object) > at > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {code} > They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to > get a write lock on this classes local ReentrantReadWriteLock while holding > MetricsRegionSourceImpl's readWriteLock write lock. > Then, elsewhere the JmxCacheBuster is running trying to get metrics with > above locks held in reverse: > {code} > "HBase-Metrics2-1" daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting > on condition [0x000140ea5000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007cade1480> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) > at > org.apache.hadoop.hbase.re
[jira] [Commented] (HBASE-14278) Fix NPE that is showing up since HBASE-14274 went in
[ https://issues.apache.org/jira/browse/HBASE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791562#comment-14791562 ] Hudson commented on HBASE-14278: FAILURE: Integrated in HBase-1.2 #180 (See [https://builds.apache.org/job/HBase-1.2/180/]) HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev a229ac91fbab2608ae89bbe44b1dd05e5aef1183) * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java > Fix NPE that is showing up since HBASE-14274 went in > > > Key: HBASE-14278 > URL: https://issues.apache.org/jira/browse/HBASE-14278 > Project: HBase > Issue Type: Sub-task > Components: test >Affects Versions: 2.0.0, 1.2.0, 1.3.0 >Reporter: stack >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14278-v1.patch, HBASE-14278-v2.patch, > HBASE-14278-v3.patch, HBASE-14278-v4.patch, HBASE-14278-v5.patch, > HBASE-14278.patch > > > Saw this in TestDistributedLogSplitting after HBASE-14274 was applied. > {code} > 119113 2015-08-20 15:31:10,704 WARN [HBase-Metrics2-1] > impl.MetricsConfig(124): Cannot locate configuration: tried > hadoop-metrics2-hbase.properties,hadoop-metrics2.properties > 119114 2015-08-20 15:31:10,710 ERROR [HBase-Metrics2-1] > lib.MethodMetric$2(118): Error invoking method getBlocksTotal > 119115 java.lang.reflect.InvocationTargetException > 119116 › at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source) > 119117 › at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 119118 › at java.lang.reflect.Method.invoke(Method.java:606) > 119119 › at > org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111) > 119120 › at > org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144) > 119121 › at > org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:387) > 119122 › at > org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79) > 119123 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) > 119124 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) > 119125 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) > 119126 › at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) > 119127 › at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) > 119128 › at > com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) > 119129 › at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57) > 119130 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:221) > 119131 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:96) > 119132 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:245) > 119133 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:229) > 119134 › at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source) > 119135 › at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 119136 › at java.lang.reflect.Method.invoke(Method.java:606) > 119137 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:290) > 119138 › at com.sun.proxy.$Proxy13.postStart(Unknown Source) > 119139 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:185) > 119140 › at > org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:81) > 119141 › at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > 119142 › at java.util.concurrent.FutureTask.run(FutureTask.java:262) > 119143 › at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) > 119144 › at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:29
[jira] [Commented] (HBASE-14334) Move Memcached block cache in to it's own optional module.
[ https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791564#comment-14791564 ] Hudson commented on HBASE-14334: FAILURE: Integrated in HBase-1.2 #180 (See [https://builds.apache.org/job/HBase-1.2/180/]) HBASE-14334 Move Memcached block cache in to it's own optional module. (eclark: rev 20f272cb7fdb87598f3e995467853c3770faab55) * pom.xml * hbase-server/pom.xml * hbase-assembly/src/main/assembly/hadoop-two-compat.xml * hbase-external-blockcache/src/main/java/org/apache/hadoop/hbase/io/hfile/MemcachedBlockCache.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java * hbase-external-blockcache/pom.xml * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/MemcachedBlockCache.java * hbase-assembly/pom.xml > Move Memcached block cache in to it's own optional module. > -- > > Key: HBASE-14334 > URL: https://issues.apache.org/jira/browse/HBASE-14334 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0 > > Attachments: HBASE-14334-v1.patch, HBASE-14334.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14278) Fix NPE that is showing up since HBASE-14274 went in
[ https://issues.apache.org/jira/browse/HBASE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791554#comment-14791554 ] Hudson commented on HBASE-14278: FAILURE: Integrated in HBase-1.3 #182 (See [https://builds.apache.org/job/HBase-1.3/182/]) HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev 2029e851827fa1bf59436c7baa1971b52ac5833e) * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java > Fix NPE that is showing up since HBASE-14274 went in > > > Key: HBASE-14278 > URL: https://issues.apache.org/jira/browse/HBASE-14278 > Project: HBase > Issue Type: Sub-task > Components: test >Affects Versions: 2.0.0, 1.2.0, 1.3.0 >Reporter: stack >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14278-v1.patch, HBASE-14278-v2.patch, > HBASE-14278-v3.patch, HBASE-14278-v4.patch, HBASE-14278-v5.patch, > HBASE-14278.patch > > > Saw this in TestDistributedLogSplitting after HBASE-14274 was applied. > {code} > 119113 2015-08-20 15:31:10,704 WARN [HBase-Metrics2-1] > impl.MetricsConfig(124): Cannot locate configuration: tried > hadoop-metrics2-hbase.properties,hadoop-metrics2.properties > 119114 2015-08-20 15:31:10,710 ERROR [HBase-Metrics2-1] > lib.MethodMetric$2(118): Error invoking method getBlocksTotal > 119115 java.lang.reflect.InvocationTargetException > 119116 › at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source) > 119117 › at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 119118 › at java.lang.reflect.Method.invoke(Method.java:606) > 119119 › at > org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111) > 119120 › at > org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144) > 119121 › at > org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:387) > 119122 › at > org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79) > 119123 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) > 119124 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) > 119125 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) > 119126 › at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) > 119127 › at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) > 119128 › at > com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) > 119129 › at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57) > 119130 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:221) > 119131 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:96) > 119132 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:245) > 119133 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:229) > 119134 › at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source) > 119135 › at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 119136 › at java.lang.reflect.Method.invoke(Method.java:606) > 119137 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:290) > 119138 › at com.sun.proxy.$Proxy13.postStart(Unknown Source) > 119139 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:185) > 119140 › at > org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:81) > 119141 › at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > 119142 › at java.util.concurrent.FutureTask.run(FutureTask.java:262) > 119143 › at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) > 119144 › at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:29
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791556#comment-14791556 ] Hudson commented on HBASE-14274: FAILURE: Integrated in HBase-1.3 #182 (See [https://builds.apache.org/job/HBase-1.3/182/]) HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev 2029e851827fa1bf59436c7baa1971b52ac5833e) * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java > Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs > MetricsRegionAggregateSourceImpl > --- > > Key: HBASE-14274 > URL: https://issues.apache.org/jira/browse/HBASE-14274 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, > HBASE-14274.patch > > > Looking into parent issue, got a hang locally of TestDistributedLogReplay. > We have region closes here: > {code} > "RS_CLOSE_META-localhost:59610-0" prio=5 tid=0x7ff65c03f800 nid=0x54347 > waiting on condition [0x00011f7ac000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00075636d8c0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) > at > org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) > at > org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) > - locked <0x0007ff878190> (a java.lang.Object) > at > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {code} > They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to > get a write lock on this classes local ReentrantReadWriteLock while holding > MetricsRegionSourceImpl's readWriteLock write lock. > Then, elsewhere the JmxCacheBuster is running trying to get metrics with > above locks held in reverse: > {code} > "HBase-Metrics2-1" daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting > on condition [0x000140ea5000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007cade1480> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) > at > org.apache.hadoop.hbase.re
[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791555#comment-14791555 ] Hudson commented on HBASE-14082: FAILURE: Integrated in HBase-1.3 #182 (See [https://builds.apache.org/job/HBase-1.3/182/]) HBASE-14082 Add replica id to JMX metrics names (Lei Chen) (enis: rev bb4a690b79a2485d24aa84b9635b7fea0ff6b0d4) * hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSource.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegion.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/MetricsRegionWrapperStub.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java * hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionWrapper.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionWrapperImpl.java > Add replica id to JMX metrics names > --- > > Key: HBASE-14082 > URL: https://issues.apache.org/jira/browse/HBASE-14082 > Project: HBase > Issue Type: Improvement > Components: metrics >Reporter: Lei Chen >Assignee: Lei Chen > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14082-v6.patch, HBASE-14082-v1.patch, > HBASE-14082-v2.patch, HBASE-14082-v3.patch, HBASE-14082-v4.patch, > HBASE-14082-v5.patch > > > Today, via JMX, one cannot distinguish a primary region from a replica. A > possible solution is to add replica id to JMX metrics names. The benefits may > include, for example: > # Knowing the latency of a read request on a replica region means the first > attempt to the primary region has timeout. > # Write requests on replicas are due to the replication process, while the > ones on primary are from clients. > # In case of looking for hot spots of read operations, replicas should be > excluded since TIMELINE reads are sent to all replicas. > To implement, we can change the format of metrics names found at > {code}Hadoop->HBase->RegionServer->Regions->Attributes{code} > from > {code}namespace__table__region__metric_{code} > to > {code}namespace__table__region__replicaid__metric_{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14334) Move Memcached block cache in to it's own optional module.
[ https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791557#comment-14791557 ] Hudson commented on HBASE-14334: FAILURE: Integrated in HBase-1.3 #182 (See [https://builds.apache.org/job/HBase-1.3/182/]) HBASE-14334 Move Memcached block cache in to it's own optional module. (eclark: rev d4d398d9420506b00562c180259501bf2f5401be) * hbase-server/pom.xml * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java * hbase-external-blockcache/pom.xml * hbase-assembly/src/main/assembly/hadoop-two-compat.xml * pom.xml * hbase-assembly/pom.xml * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/MemcachedBlockCache.java * hbase-external-blockcache/src/main/java/org/apache/hadoop/hbase/io/hfile/MemcachedBlockCache.java > Move Memcached block cache in to it's own optional module. > -- > > Key: HBASE-14334 > URL: https://issues.apache.org/jira/browse/HBASE-14334 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0 > > Attachments: HBASE-14334-v1.patch, HBASE-14334.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14447) Spark tests failing: bind exception when putting up info server
[ https://issues.apache.org/jira/browse/HBASE-14447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14447: -- Status: Patch Available (was: Open) > Spark tests failing: bind exception when putting up info server > --- > > Key: HBASE-14447 > URL: https://issues.apache.org/jira/browse/HBASE-14447 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack >Priority: Minor > Attachments: 14447.patch > > > Go tthis: > {code} > Running org.apache.hadoop.hbase.spark.TestJavaHBaseContext > Tests run: 8, Failures: 0, Errors: 8, Skipped: 0, Time elapsed: 540.875 sec > <<< FAILURE! - in org.apache.hadoop.hbase.spark.TestJavaHBaseContext > testBulkDelete(org.apache.hadoop.hbase.spark.TestJavaHBaseContext) Time > elapsed: 540.647 sec <<< ERROR! > java.lang.RuntimeException: java.io.IOException: Shutting down > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at > org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216) > at > org.apache.hadoop.hbase.http.HttpServer.openListeners(HttpServer.java:1012) > at org.apache.hadoop.hbase.http.HttpServer.start(HttpServer.java:953) > at org.apache.hadoop.hbase.http.InfoServer.start(InfoServer.java:91) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.putUpWebUI(HRegionServer.java:1788) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:603) > at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:367) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:139) > at > org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:218) > at > org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:154) > at > org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:214) > at > org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1075) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1041) > at > org.apache.hadoop.hbase.spark.TestJavaHBaseContext.setUp(TestJavaHBaseContext.java:82) > ... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14447) Spark tests failing: bind exception when putting up info server
[ https://issues.apache.org/jira/browse/HBASE-14447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14447: -- Attachment: 14447.patch Same as HBASE-14435 > Spark tests failing: bind exception when putting up info server > --- > > Key: HBASE-14447 > URL: https://issues.apache.org/jira/browse/HBASE-14447 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack >Priority: Minor > Attachments: 14447.patch > > > Go tthis: > {code} > Running org.apache.hadoop.hbase.spark.TestJavaHBaseContext > Tests run: 8, Failures: 0, Errors: 8, Skipped: 0, Time elapsed: 540.875 sec > <<< FAILURE! - in org.apache.hadoop.hbase.spark.TestJavaHBaseContext > testBulkDelete(org.apache.hadoop.hbase.spark.TestJavaHBaseContext) Time > elapsed: 540.647 sec <<< ERROR! > java.lang.RuntimeException: java.io.IOException: Shutting down > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at > org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216) > at > org.apache.hadoop.hbase.http.HttpServer.openListeners(HttpServer.java:1012) > at org.apache.hadoop.hbase.http.HttpServer.start(HttpServer.java:953) > at org.apache.hadoop.hbase.http.InfoServer.start(InfoServer.java:91) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.putUpWebUI(HRegionServer.java:1788) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:603) > at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:367) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:139) > at > org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:218) > at > org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:154) > at > org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:214) > at > org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1075) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1041) > at > org.apache.hadoop.hbase.spark.TestJavaHBaseContext.setUp(TestJavaHBaseContext.java:82) > ... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14447) Spark tests failing: bind exception when putting up info server
stack created HBASE-14447: - Summary: Spark tests failing: bind exception when putting up info server Key: HBASE-14447 URL: https://issues.apache.org/jira/browse/HBASE-14447 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: stack Priority: Minor Go tthis: {code} Running org.apache.hadoop.hbase.spark.TestJavaHBaseContext Tests run: 8, Failures: 0, Errors: 8, Skipped: 0, Time elapsed: 540.875 sec <<< FAILURE! - in org.apache.hadoop.hbase.spark.TestJavaHBaseContext testBulkDelete(org.apache.hadoop.hbase.spark.TestJavaHBaseContext) Time elapsed: 540.647 sec <<< ERROR! java.lang.RuntimeException: java.io.IOException: Shutting down at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216) at org.apache.hadoop.hbase.http.HttpServer.openListeners(HttpServer.java:1012) at org.apache.hadoop.hbase.http.HttpServer.start(HttpServer.java:953) at org.apache.hadoop.hbase.http.InfoServer.start(InfoServer.java:91) at org.apache.hadoop.hbase.regionserver.HRegionServer.putUpWebUI(HRegionServer.java:1788) at org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:603) at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:367) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:139) at org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:218) at org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:154) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:214) at org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1075) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1041) at org.apache.hadoop.hbase.spark.TestJavaHBaseContext.setUp(TestJavaHBaseContext.java:82) ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14404) Backport HBASE-14098 (Allow dropping caches behind compactions) to 0.98
[ https://issues.apache.org/jira/browse/HBASE-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791536#comment-14791536 ] Lars Hofhansl commented on HBASE-14404: --- Looking at the patch detail now. It does not allow for sticking with the default setup for HDFS. Whatever the HBase setting is will override whatever was set globally for HDFS, that might be surprising. > Backport HBASE-14098 (Allow dropping caches behind compactions) to 0.98 > --- > > Key: HBASE-14404 > URL: https://issues.apache.org/jira/browse/HBASE-14404 > Project: HBase > Issue Type: Task >Reporter: Andrew Purtell >Assignee: Andrew Purtell > Fix For: 0.98.15 > > Attachments: HBASE-14404-0.98.patch > > > HBASE-14098 adds a new configuration toggle - > "hbase.hfile.drop.behind.compaction" - which if set to "true" tells > compactions to drop pages from the OS blockcache after write. It's on by > default where committed so far but a backport to 0.98 would default it to > off. (The backport would also retain compat methods to LimitedPrivate > interface StoreFileScanner.) What could make it a controversial change in > 0.98 is it changes the default setting of > 'hbase.regionserver.compaction.private.readers' from "false" to "true". I > think it's fine, we use private readers in production. They're stable and do > not present perf issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13250) chown of ExportSnapshot does not cover all path and files
[ https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791515#comment-14791515 ] Hudson commented on HBASE-13250: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1078 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1078/]) HBASE-13250 Revert due to compilation error against hadoop-1 profile (tedyu: rev 38995fbd51ac4735b673dd1527cb2631b69b7474) * hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java > chown of ExportSnapshot does not cover all path and files > - > > Key: HBASE-13250 > URL: https://issues.apache.org/jira/browse/HBASE-13250 > Project: HBase > Issue Type: Bug >Reporter: He Liangliang >Assignee: He Liangliang >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: 13250-0.98-v2.txt, HBASE-13250-V0.patch > > > The chuser/chgroup function only covers the leaf hfile. The ownership of > hfile parent paths and snapshot reference files are not changed as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14334) Move Memcached block cache in to it's own optional module.
[ https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791507#comment-14791507 ] Hudson commented on HBASE-14334: FAILURE: Integrated in HBase-TRUNK #6816 (See [https://builds.apache.org/job/HBase-TRUNK/6816/]) HBASE-14334 Move Memcached block cache in to it's own optional module. (eclark: rev 7b08f4c8be60582cd02ba31161be214c9c9d40f9) * pom.xml * hbase-server/pom.xml * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/MemcachedBlockCache.java * hbase-assembly/src/main/assembly/hadoop-two-compat.xml * hbase-external-blockcache/src/main/java/org/apache/hadoop/hbase/io/hfile/MemcachedBlockCache.java * hbase-assembly/pom.xml * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java * hbase-external-blockcache/pom.xml > Move Memcached block cache in to it's own optional module. > -- > > Key: HBASE-14334 > URL: https://issues.apache.org/jira/browse/HBASE-14334 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0 > > Attachments: HBASE-14334-v1.patch, HBASE-14334.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-14082: -- Resolution: Fixed Fix Version/s: 1.3.0 1.2.0 Status: Resolved (was: Patch Available) I have committed this to 1.2+. Thanks Lei for the patch. > Add replica id to JMX metrics names > --- > > Key: HBASE-14082 > URL: https://issues.apache.org/jira/browse/HBASE-14082 > Project: HBase > Issue Type: Improvement > Components: metrics >Reporter: Lei Chen >Assignee: Lei Chen > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14082-v6.patch, HBASE-14082-v1.patch, > HBASE-14082-v2.patch, HBASE-14082-v3.patch, HBASE-14082-v4.patch, > HBASE-14082-v5.patch > > > Today, via JMX, one cannot distinguish a primary region from a replica. A > possible solution is to add replica id to JMX metrics names. The benefits may > include, for example: > # Knowing the latency of a read request on a replica region means the first > attempt to the primary region has timeout. > # Write requests on replicas are due to the replication process, while the > ones on primary are from clients. > # In case of looking for hot spots of read operations, replicas should be > excluded since TIMELINE reads are sent to all replicas. > To implement, we can change the format of metrics names found at > {code}Hadoop->HBase->RegionServer->Regions->Attributes{code} > from > {code}namespace__table__region__metric_{code} > to > {code}namespace__table__region__replicaid__metric_{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14334) Move Memcached block cache in to it's own optional module.
[ https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791455#comment-14791455 ] Hudson commented on HBASE-14334: FAILURE: Integrated in HBase-1.2-IT #152 (See [https://builds.apache.org/job/HBase-1.2-IT/152/]) HBASE-14334 Move Memcached block cache in to it's own optional module. (eclark: rev 20f272cb7fdb87598f3e995467853c3770faab55) * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/MemcachedBlockCache.java * pom.xml * hbase-external-blockcache/pom.xml * hbase-assembly/pom.xml * hbase-assembly/src/main/assembly/hadoop-two-compat.xml * hbase-external-blockcache/src/main/java/org/apache/hadoop/hbase/io/hfile/MemcachedBlockCache.java * hbase-server/pom.xml > Move Memcached block cache in to it's own optional module. > -- > > Key: HBASE-14334 > URL: https://issues.apache.org/jira/browse/HBASE-14334 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0 > > Attachments: HBASE-14334-v1.patch, HBASE-14334.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791454#comment-14791454 ] Hudson commented on HBASE-14274: FAILURE: Integrated in HBase-1.2-IT #152 (See [https://builds.apache.org/job/HBase-1.2-IT/152/]) HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev a229ac91fbab2608ae89bbe44b1dd05e5aef1183) * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java > Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs > MetricsRegionAggregateSourceImpl > --- > > Key: HBASE-14274 > URL: https://issues.apache.org/jira/browse/HBASE-14274 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, > HBASE-14274.patch > > > Looking into parent issue, got a hang locally of TestDistributedLogReplay. > We have region closes here: > {code} > "RS_CLOSE_META-localhost:59610-0" prio=5 tid=0x7ff65c03f800 nid=0x54347 > waiting on condition [0x00011f7ac000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00075636d8c0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) > at > org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) > at > org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) > - locked <0x0007ff878190> (a java.lang.Object) > at > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {code} > They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to > get a write lock on this classes local ReentrantReadWriteLock while holding > MetricsRegionSourceImpl's readWriteLock write lock. > Then, elsewhere the JmxCacheBuster is running trying to get metrics with > above locks held in reverse: > {code} > "HBase-Metrics2-1" daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting > on condition [0x000140ea5000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007cade1480> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) > at > org.apache.hadoop.hb
[jira] [Commented] (HBASE-14278) Fix NPE that is showing up since HBASE-14274 went in
[ https://issues.apache.org/jira/browse/HBASE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791453#comment-14791453 ] Hudson commented on HBASE-14278: FAILURE: Integrated in HBase-1.2-IT #152 (See [https://builds.apache.org/job/HBase-1.2-IT/152/]) HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev a229ac91fbab2608ae89bbe44b1dd05e5aef1183) * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java > Fix NPE that is showing up since HBASE-14274 went in > > > Key: HBASE-14278 > URL: https://issues.apache.org/jira/browse/HBASE-14278 > Project: HBase > Issue Type: Sub-task > Components: test >Affects Versions: 2.0.0, 1.2.0, 1.3.0 >Reporter: stack >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14278-v1.patch, HBASE-14278-v2.patch, > HBASE-14278-v3.patch, HBASE-14278-v4.patch, HBASE-14278-v5.patch, > HBASE-14278.patch > > > Saw this in TestDistributedLogSplitting after HBASE-14274 was applied. > {code} > 119113 2015-08-20 15:31:10,704 WARN [HBase-Metrics2-1] > impl.MetricsConfig(124): Cannot locate configuration: tried > hadoop-metrics2-hbase.properties,hadoop-metrics2.properties > 119114 2015-08-20 15:31:10,710 ERROR [HBase-Metrics2-1] > lib.MethodMetric$2(118): Error invoking method getBlocksTotal > 119115 java.lang.reflect.InvocationTargetException > 119116 › at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source) > 119117 › at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 119118 › at java.lang.reflect.Method.invoke(Method.java:606) > 119119 › at > org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111) > 119120 › at > org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144) > 119121 › at > org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:387) > 119122 › at > org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79) > 119123 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) > 119124 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) > 119125 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) > 119126 › at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) > 119127 › at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) > 119128 › at > com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) > 119129 › at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57) > 119130 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:221) > 119131 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:96) > 119132 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:245) > 119133 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:229) > 119134 › at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source) > 119135 › at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 119136 › at java.lang.reflect.Method.invoke(Method.java:606) > 119137 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:290) > 119138 › at com.sun.proxy.$Proxy13.postStart(Unknown Source) > 119139 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:185) > 119140 › at > org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:81) > 119141 › at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > 119142 › at java.util.concurrent.FutureTask.run(FutureTask.java:262) > 119143 › at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) > 119144 › at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.j
[jira] [Commented] (HBASE-14278) Fix NPE that is showing up since HBASE-14274 went in
[ https://issues.apache.org/jira/browse/HBASE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791448#comment-14791448 ] Hudson commented on HBASE-14278: SUCCESS: Integrated in HBase-1.3-IT #162 (See [https://builds.apache.org/job/HBase-1.3-IT/162/]) HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev 2029e851827fa1bf59436c7baa1971b52ac5833e) * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java > Fix NPE that is showing up since HBASE-14274 went in > > > Key: HBASE-14278 > URL: https://issues.apache.org/jira/browse/HBASE-14278 > Project: HBase > Issue Type: Sub-task > Components: test >Affects Versions: 2.0.0, 1.2.0, 1.3.0 >Reporter: stack >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14278-v1.patch, HBASE-14278-v2.patch, > HBASE-14278-v3.patch, HBASE-14278-v4.patch, HBASE-14278-v5.patch, > HBASE-14278.patch > > > Saw this in TestDistributedLogSplitting after HBASE-14274 was applied. > {code} > 119113 2015-08-20 15:31:10,704 WARN [HBase-Metrics2-1] > impl.MetricsConfig(124): Cannot locate configuration: tried > hadoop-metrics2-hbase.properties,hadoop-metrics2.properties > 119114 2015-08-20 15:31:10,710 ERROR [HBase-Metrics2-1] > lib.MethodMetric$2(118): Error invoking method getBlocksTotal > 119115 java.lang.reflect.InvocationTargetException > 119116 › at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source) > 119117 › at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 119118 › at java.lang.reflect.Method.invoke(Method.java:606) > 119119 › at > org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111) > 119120 › at > org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144) > 119121 › at > org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:387) > 119122 › at > org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79) > 119123 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) > 119124 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) > 119125 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) > 119126 › at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) > 119127 › at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) > 119128 › at > com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) > 119129 › at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57) > 119130 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:221) > 119131 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:96) > 119132 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:245) > 119133 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:229) > 119134 › at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source) > 119135 › at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 119136 › at java.lang.reflect.Method.invoke(Method.java:606) > 119137 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:290) > 119138 › at com.sun.proxy.$Proxy13.postStart(Unknown Source) > 119139 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:185) > 119140 › at > org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:81) > 119141 › at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > 119142 › at java.util.concurrent.FutureTask.run(FutureTask.java:262) > 119143 › at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) > 119144 › at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.j
[jira] [Commented] (HBASE-14334) Move Memcached block cache in to it's own optional module.
[ https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791450#comment-14791450 ] Hudson commented on HBASE-14334: SUCCESS: Integrated in HBase-1.3-IT #162 (See [https://builds.apache.org/job/HBase-1.3-IT/162/]) HBASE-14334 Move Memcached block cache in to it's own optional module. (eclark: rev d4d398d9420506b00562c180259501bf2f5401be) * hbase-assembly/src/main/assembly/hadoop-two-compat.xml * hbase-external-blockcache/pom.xml * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/MemcachedBlockCache.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java * hbase-server/pom.xml * hbase-assembly/pom.xml * pom.xml * hbase-external-blockcache/src/main/java/org/apache/hadoop/hbase/io/hfile/MemcachedBlockCache.java > Move Memcached block cache in to it's own optional module. > -- > > Key: HBASE-14334 > URL: https://issues.apache.org/jira/browse/HBASE-14334 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0 > > Attachments: HBASE-14334-v1.patch, HBASE-14334.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14274) Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs MetricsRegionAggregateSourceImpl
[ https://issues.apache.org/jira/browse/HBASE-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791449#comment-14791449 ] Hudson commented on HBASE-14274: SUCCESS: Integrated in HBase-1.3-IT #162 (See [https://builds.apache.org/job/HBase-1.3-IT/162/]) HBASE-14278 Fix NPE that is showing up since HBASE-14274 went in (eclark: rev 2029e851827fa1bf59436c7baa1971b52ac5833e) * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/metrics2/impl/JmxCacheBuster.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java > Deadlock in region metrics on shutdown: MetricsRegionSourceImpl vs > MetricsRegionAggregateSourceImpl > --- > > Key: HBASE-14274 > URL: https://issues.apache.org/jira/browse/HBASE-14274 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14274-addendum.txt, 23612.stack, HBASE-14274-v1.patch, > HBASE-14274.patch > > > Looking into parent issue, got a hang locally of TestDistributedLogReplay. > We have region closes here: > {code} > "RS_CLOSE_META-localhost:59610-0" prio=5 tid=0x7ff65c03f800 nid=0x54347 > waiting on condition [0x00011f7ac000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00075636d8c0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:78) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:120) > at > org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:41) > at > org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1500) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344) > - locked <0x0007ff878190> (a java.lang.Object) > at > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:102) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:103) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {code} > They are trying to MetricsRegionAggregateSourceImpl.deregister. They want to > get a write lock on this classes local ReentrantReadWriteLock while holding > MetricsRegionSourceImpl's readWriteLock write lock. > Then, elsewhere the JmxCacheBuster is running trying to get metrics with > above locks held in reverse: > {code} > "HBase-Metrics2-1" daemon prio=5 tid=0x7ff65e14b000 nid=0x59a03 waiting > on condition [0x000140ea5000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0007cade1480> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.snapshot(MetricsRegionSourceImpl.java:193) > at > org.apache.hadoop.hb
[jira] [Reopened] (HBASE-14275) Backport to 0.98 HBASE-10785 Metas own location should be cached
[ https://issues.apache.org/jira/browse/HBASE-14275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-14275: I'm seeing instability in TestAssignmentManagerOnCluster and TestZKLessAMOnCluster and a bisect lead back to this change. Let me repeat the bisect and update shortly. > Backport to 0.98 HBASE-10785 Metas own location should be cached > > > Key: HBASE-14275 > URL: https://issues.apache.org/jira/browse/HBASE-14275 > Project: HBase > Issue Type: Improvement >Reporter: Jerry He >Assignee: Jerry He > Fix For: 0.98.14 > > Attachments: HBASE-14275-0.98.patch > > > We've seen similar problem reported on 0.98. > It is good improvement to have. > This will cover HBASE-10785 and the a later HBASE-11332. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13250) chown of ExportSnapshot does not cover all path and files
[ https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791393#comment-14791393 ] Hudson commented on HBASE-13250: FAILURE: Integrated in HBase-1.2 #179 (See [https://builds.apache.org/job/HBase-1.2/179/]) HBASE-13250 chown of ExportSnapshot does not cover all path and files (He Liangliang) (tedyu: rev b243c898e72d835a731d893c853c958072d42038) * hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java > chown of ExportSnapshot does not cover all path and files > - > > Key: HBASE-13250 > URL: https://issues.apache.org/jira/browse/HBASE-13250 > Project: HBase > Issue Type: Bug >Reporter: He Liangliang >Assignee: He Liangliang >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: 13250-0.98-v2.txt, HBASE-13250-V0.patch > > > The chuser/chgroup function only covers the leaf hfile. The ownership of > hfile parent paths and snapshot reference files are not changed as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13250) chown of ExportSnapshot does not cover all path and files
[ https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791384#comment-14791384 ] Hudson commented on HBASE-13250: FAILURE: Integrated in HBase-0.98 #1124 (See [https://builds.apache.org/job/HBase-0.98/1124/]) HBASE-13250 chown of ExportSnapshot does not cover all path and files (He Liangliang) (tedyu: rev bcd986e47b8d633c996c8a2040c2a40b32cb5c59) * hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java > chown of ExportSnapshot does not cover all path and files > - > > Key: HBASE-13250 > URL: https://issues.apache.org/jira/browse/HBASE-13250 > Project: HBase > Issue Type: Bug >Reporter: He Liangliang >Assignee: He Liangliang >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: 13250-0.98-v2.txt, HBASE-13250-V0.patch > > > The chuser/chgroup function only covers the leaf hfile. The ownership of > hfile parent paths and snapshot reference files are not changed as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13250) chown of ExportSnapshot does not cover all path and files
[ https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791377#comment-14791377 ] Hudson commented on HBASE-13250: FAILURE: Integrated in HBase-1.3 #181 (See [https://builds.apache.org/job/HBase-1.3/181/]) HBASE-13250 chown of ExportSnapshot does not cover all path and files (He Liangliang) (tedyu: rev 6598f18e564bf06348e99863548546f092808c35) * hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java > chown of ExportSnapshot does not cover all path and files > - > > Key: HBASE-13250 > URL: https://issues.apache.org/jira/browse/HBASE-13250 > Project: HBase > Issue Type: Bug >Reporter: He Liangliang >Assignee: He Liangliang >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: 13250-0.98-v2.txt, HBASE-13250-V0.patch > > > The chuser/chgroup function only covers the leaf hfile. The ownership of > hfile parent paths and snapshot reference files are not changed as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14278) Fix NPE that is showing up since HBASE-14274 went in
[ https://issues.apache.org/jira/browse/HBASE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-14278: -- Resolution: Fixed Status: Resolved (was: Patch Available) > Fix NPE that is showing up since HBASE-14274 went in > > > Key: HBASE-14278 > URL: https://issues.apache.org/jira/browse/HBASE-14278 > Project: HBase > Issue Type: Sub-task > Components: test >Affects Versions: 2.0.0, 1.2.0, 1.3.0 >Reporter: stack >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14278-v1.patch, HBASE-14278-v2.patch, > HBASE-14278-v3.patch, HBASE-14278-v4.patch, HBASE-14278-v5.patch, > HBASE-14278.patch > > > Saw this in TestDistributedLogSplitting after HBASE-14274 was applied. > {code} > 119113 2015-08-20 15:31:10,704 WARN [HBase-Metrics2-1] > impl.MetricsConfig(124): Cannot locate configuration: tried > hadoop-metrics2-hbase.properties,hadoop-metrics2.properties > 119114 2015-08-20 15:31:10,710 ERROR [HBase-Metrics2-1] > lib.MethodMetric$2(118): Error invoking method getBlocksTotal > 119115 java.lang.reflect.InvocationTargetException > 119116 › at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source) > 119117 › at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 119118 › at java.lang.reflect.Method.invoke(Method.java:606) > 119119 › at > org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111) > 119120 › at > org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144) > 119121 › at > org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:387) > 119122 › at > org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79) > 119123 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) > 119124 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) > 119125 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) > 119126 › at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) > 119127 › at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) > 119128 › at > com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) > 119129 › at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57) > 119130 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:221) > 119131 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:96) > 119132 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:245) > 119133 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:229) > 119134 › at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source) > 119135 › at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 119136 › at java.lang.reflect.Method.invoke(Method.java:606) > 119137 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:290) > 119138 › at com.sun.proxy.$Proxy13.postStart(Unknown Source) > 119139 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:185) > 119140 › at > org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:81) > 119141 › at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > 119142 › at java.util.concurrent.FutureTask.run(FutureTask.java:262) > 119143 › at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) > 119144 › at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) > 119145 › at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > 119146 › at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > 119147 › at java.lang.Thread.run(Thread.java:744) > 119148 Caused by: java.lang.NullPointerException > 119149 › at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.size(BlocksMap.java:198) > 119150 › at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.getTotalBlocks(BlockManager.java:3158) > 119151 › at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlocksTotal(FSNamesystem.java:5652) > 119152 › ... 32 more > {code} -- This message was sent by Atlassian JIRA
[jira] [Created] (HBASE-14446) Save table descriptors and region infos during incremental backup
Vladimir Rodionov created HBASE-14446: - Summary: Save table descriptors and region infos during incremental backup Key: HBASE-14446 URL: https://issues.apache.org/jira/browse/HBASE-14446 Project: HBase Issue Type: Sub-task Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Fix For: 2.0.0 The current implementation of incremental backup just moves WAL files into backup directory.The restore procedure of incremental backup relies on full restore (from snapshot) as source of all table meta. Two problems: # Table configuration/properties may be changed after full backup and we will loose this info during restore # We can not convert WAL files into HFiles w/o having table description and layout. Need for merge tool -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HBASE-14446) Save table descriptors and region infos during incremental backup
[ https://issues.apache.org/jira/browse/HBASE-14446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-14446 started by Vladimir Rodionov. - > Save table descriptors and region infos during incremental backup > -- > > Key: HBASE-14446 > URL: https://issues.apache.org/jira/browse/HBASE-14446 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > The current implementation of incremental backup just moves WAL files into > backup directory.The restore procedure of incremental backup relies on full > restore (from snapshot) as source of all table meta. > Two problems: > # Table configuration/properties may be changed after full backup and we will > loose this info during restore > # We can not convert WAL files into HFiles w/o having table description and > layout. Need for merge tool -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14352) Replication is terribly slow with WAL compression
[ https://issues.apache.org/jira/browse/HBASE-14352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791304#comment-14791304 ] Lars Hofhansl commented on HBASE-14352: --- That's a good point. If there's no advantage to compresses WALs ever, let's get rid of the code. I think [~abhishek.chouhan] found write performance neutral with much reduced storage (20%). Only replication was significantly slower. Would certainly be nice if we could compress between DC when doing replication (but that's a different issue). > Replication is terribly slow with WAL compression > - > > Key: HBASE-14352 > URL: https://issues.apache.org/jira/browse/HBASE-14352 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.13 >Reporter: Abhishek Singh Chouhan > Attachments: age_of_last_shipped.png, size_of_log_queue.png > > > For the same load, replication with WAL compression enabled is almost 6x > slower than with compression turned off. Age of last shipped operation is > also correspondingly much higher when compression is turned on. > By observing Size of log queue we can see that it is taking too much time for > the queue to clear up. > Attaching corresponding graphs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13250) chown of ExportSnapshot does not cover all path and files
[ https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791295#comment-14791295 ] Hudson commented on HBASE-13250: FAILURE: Integrated in HBase-TRUNK #6815 (See [https://builds.apache.org/job/HBase-TRUNK/6815/]) HBASE-13250 chown of ExportSnapshot does not cover all path and files (He Liangliang) (tedyu: rev 08eabb89f60b821362efaba2701ddb9db5ff8b32) * hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java > chown of ExportSnapshot does not cover all path and files > - > > Key: HBASE-13250 > URL: https://issues.apache.org/jira/browse/HBASE-13250 > Project: HBase > Issue Type: Bug >Reporter: He Liangliang >Assignee: He Liangliang >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: 13250-0.98-v2.txt, HBASE-13250-V0.patch > > > The chuser/chgroup function only covers the leaf hfile. The ownership of > hfile parent paths and snapshot reference files are not changed as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14334) Move Memcached block cache in to it's own optional module.
[ https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-14334: -- Resolution: Fixed Release Note: Move external block cache to it's own module. This will reduce dependencies for people who use hbase-server. Currently Memcached is the reference implementation for external block cache. External block caches allow HBase to take advantage of other more complex caches that can live longer than the HBase regionserver process and are not necessarily tied to a single computer life time. However external block caches add in extra operational overhead. Status: Resolved (was: Patch Available) > Move Memcached block cache in to it's own optional module. > -- > > Key: HBASE-14334 > URL: https://issues.apache.org/jira/browse/HBASE-14334 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0 > > Attachments: HBASE-14334-v1.patch, HBASE-14334.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager
[ https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791253#comment-14791253 ] stack commented on HBASE-10449: --- Ok. Not what we want. Lets look at alternative... > Wrong execution pool configuration in HConnectionManager > > > Key: HBASE-10449 > URL: https://issues.apache.org/jira/browse/HBASE-10449 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 0.98.0, 0.99.0, 0.96.1.1 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Critical > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: HBASE-10449.v1.patch > > > There is a confusion in the configuration of the pool. The attached patch > fixes this. This may change the client performances, as we were using a > single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-13250) chown of ExportSnapshot does not cover all path and files
[ https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-13250. Resolution: Fixed Fix Version/s: 0.98.15 Patch 13250-0.98-v2.txt compiles with both hadoop-2 and hadoop-1 profiles. > chown of ExportSnapshot does not cover all path and files > - > > Key: HBASE-13250 > URL: https://issues.apache.org/jira/browse/HBASE-13250 > Project: HBase > Issue Type: Bug >Reporter: He Liangliang >Assignee: He Liangliang >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: 13250-0.98-v2.txt, HBASE-13250-V0.patch > > > The chuser/chgroup function only covers the leaf hfile. The ownership of > hfile parent paths and snapshot reference files are not changed as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13250) chown of ExportSnapshot does not cover all path and files
[ https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-13250: --- Attachment: 13250-0.98-v2.txt > chown of ExportSnapshot does not cover all path and files > - > > Key: HBASE-13250 > URL: https://issues.apache.org/jira/browse/HBASE-13250 > Project: HBase > Issue Type: Bug >Reporter: He Liangliang >Assignee: He Liangliang >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3 > > Attachments: 13250-0.98-v2.txt, HBASE-13250-V0.patch > > > The chuser/chgroup function only covers the leaf hfile. The ownership of > hfile parent paths and snapshot reference files are not changed as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14334) Move Memcached block cache in to it's own optional module.
[ https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791241#comment-14791241 ] Hadoop QA commented on HBASE-14334: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12756333/HBASE-14334-v1.patch against master branch at commit bd26386dc7205c9b30b8488bc094bd380ec09adb. ATTACHMENT ID: 12756333 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd";> + ${project.build.directory}/test-classes/mrapp-generated-classpath + ${project.build.directory}/test-classes/mrapp-generated-classpath {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestReplicationShell Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15626//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15626//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15626//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15626//console This message is automatically generated. > Move Memcached block cache in to it's own optional module. > -- > > Key: HBASE-14334 > URL: https://issues.apache.org/jira/browse/HBASE-14334 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0 > > Attachments: HBASE-14334-v1.patch, HBASE-14334.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13250) chown of ExportSnapshot does not cover all path and files
[ https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791204#comment-14791204 ] Hudson commented on HBASE-13250: FAILURE: Integrated in HBase-1.1 #665 (See [https://builds.apache.org/job/HBase-1.1/665/]) HBASE-13250 chown of ExportSnapshot does not cover all path and files (He Liangliang) (tedyu: rev a1f45c1c43dfda4b044f948d4de5089662aa306b) * hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java > chown of ExportSnapshot does not cover all path and files > - > > Key: HBASE-13250 > URL: https://issues.apache.org/jira/browse/HBASE-13250 > Project: HBase > Issue Type: Bug >Reporter: He Liangliang >Assignee: He Liangliang >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3 > > Attachments: HBASE-13250-V0.patch > > > The chuser/chgroup function only covers the leaf hfile. The ownership of > hfile parent paths and snapshot reference files are not changed as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13250) chown of ExportSnapshot does not cover all path and files
[ https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791198#comment-14791198 ] Hudson commented on HBASE-13250: SUCCESS: Integrated in HBase-1.0 #1053 (See [https://builds.apache.org/job/HBase-1.0/1053/]) HBASE-13250 chown of ExportSnapshot does not cover all path and files (He Liangliang) (tedyu: rev e12b771560b94ee7843225af36f0857e6571a10a) * hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java > chown of ExportSnapshot does not cover all path and files > - > > Key: HBASE-13250 > URL: https://issues.apache.org/jira/browse/HBASE-13250 > Project: HBase > Issue Type: Bug >Reporter: He Liangliang >Assignee: He Liangliang >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3 > > Attachments: HBASE-13250-V0.patch > > > The chuser/chgroup function only covers the leaf hfile. The ownership of > hfile parent paths and snapshot reference files are not changed as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14433) Set down the client executor core thread count from 256 in tests
[ https://issues.apache.org/jira/browse/HBASE-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791189#comment-14791189 ] Hudson commented on HBASE-14433: SUCCESS: Integrated in HBase-1.3-IT #161 (See [https://builds.apache.org/job/HBase-1.3-IT/161/]) HBASE-14433 Set down the client executor core thread count from 256 in tests: REAPPLY AGAIN (WAS MISSING JIRA) (stack: rev 82554e275017bf1eb941a3b3c3145f5c2516cf54) * hbase-client/src/test/resources/hbase-site.xml * hbase-server/src/test/resources/hbase-site.xml > Set down the client executor core thread count from 256 in tests > > > Key: HBASE-14433 > URL: https://issues.apache.org/jira/browse/HBASE-14433 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14433 (1).txt, 14433.txt, 14433v2.txt, 14433v3.txt, > 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, > 14433v4.reapply.txt > > > HBASE-10449 upped our core count from 0 to 256 (max is 256). Looking in a > recent test run core dump, I see up to 256 threads per client and all are > idle. At a minimum it makes it hard reading test thread dumps. Trying to > learn more about why we went a core of 256 over in HBASE-10449. Meantime will > try setting down configs for test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13250) chown of ExportSnapshot does not cover all path and files
[ https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791190#comment-14791190 ] Hudson commented on HBASE-13250: SUCCESS: Integrated in HBase-1.3-IT #161 (See [https://builds.apache.org/job/HBase-1.3-IT/161/]) HBASE-13250 chown of ExportSnapshot does not cover all path and files (He Liangliang) (tedyu: rev 6598f18e564bf06348e99863548546f092808c35) * hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java > chown of ExportSnapshot does not cover all path and files > - > > Key: HBASE-13250 > URL: https://issues.apache.org/jira/browse/HBASE-13250 > Project: HBase > Issue Type: Bug >Reporter: He Liangliang >Assignee: He Liangliang >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3 > > Attachments: HBASE-13250-V0.patch > > > The chuser/chgroup function only covers the leaf hfile. The ownership of > hfile parent paths and snapshot reference files are not changed as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14433) Set down the client executor core thread count from 256 in tests
[ https://issues.apache.org/jira/browse/HBASE-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791155#comment-14791155 ] Hudson commented on HBASE-14433: SUCCESS: Integrated in HBase-1.2-IT #151 (See [https://builds.apache.org/job/HBase-1.2-IT/151/]) HBASE-14433 Set down the client executor core thread count from 256 in tests: REAPPLY AGAIN (WAS MISSING JIRA) (stack: rev 5764fab04d6234c77ec0333c1878237f420cc83c) * hbase-server/src/test/resources/hbase-site.xml * hbase-client/src/test/resources/hbase-site.xml > Set down the client executor core thread count from 256 in tests > > > Key: HBASE-14433 > URL: https://issues.apache.org/jira/browse/HBASE-14433 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14433 (1).txt, 14433.txt, 14433v2.txt, 14433v3.txt, > 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, > 14433v4.reapply.txt > > > HBASE-10449 upped our core count from 0 to 256 (max is 256). Looking in a > recent test run core dump, I see up to 256 threads per client and all are > idle. At a minimum it makes it hard reading test thread dumps. Trying to > learn more about why we went a core of 256 over in HBASE-10449. Meantime will > try setting down configs for test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13250) chown of ExportSnapshot does not cover all path and files
[ https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791156#comment-14791156 ] Hudson commented on HBASE-13250: SUCCESS: Integrated in HBase-1.2-IT #151 (See [https://builds.apache.org/job/HBase-1.2-IT/151/]) HBASE-13250 chown of ExportSnapshot does not cover all path and files (He Liangliang) (tedyu: rev b243c898e72d835a731d893c853c958072d42038) * hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java > chown of ExportSnapshot does not cover all path and files > - > > Key: HBASE-13250 > URL: https://issues.apache.org/jira/browse/HBASE-13250 > Project: HBase > Issue Type: Bug >Reporter: He Liangliang >Assignee: He Liangliang >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3 > > Attachments: HBASE-13250-V0.patch > > > The chuser/chgroup function only covers the leaf hfile. The ownership of > hfile parent paths and snapshot reference files are not changed as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager
[ https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791145#comment-14791145 ] Nicolas Liochon commented on HBASE-10449: - It's the former: in this case, the queries are queued. A new thread will be created only when the queue is full. Then, if we reach maxThreads and the queue is full the new tasks are rejected. In our case the queue is nearly unbounded, so we stay with corePoolSize. > Wrong execution pool configuration in HConnectionManager > > > Key: HBASE-10449 > URL: https://issues.apache.org/jira/browse/HBASE-10449 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 0.98.0, 0.99.0, 0.96.1.1 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Critical > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: HBASE-10449.v1.patch > > > There is a confusion in the configuration of the pool. The attached patch > fixes this. This may change the client performances, as we were using a > single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14433) Set down the client executor core thread count from 256 in tests
[ https://issues.apache.org/jira/browse/HBASE-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791140#comment-14791140 ] Hudson commented on HBASE-14433: FAILURE: Integrated in HBase-1.2 #178 (See [https://builds.apache.org/job/HBase-1.2/178/]) HBASE-14433 Set down the client executor core thread count from 256 in tests: REAPPLY AGAIN (WAS MISSING JIRA) (stack: rev 5764fab04d6234c77ec0333c1878237f420cc83c) * hbase-server/src/test/resources/hbase-site.xml * hbase-client/src/test/resources/hbase-site.xml > Set down the client executor core thread count from 256 in tests > > > Key: HBASE-14433 > URL: https://issues.apache.org/jira/browse/HBASE-14433 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14433 (1).txt, 14433.txt, 14433v2.txt, 14433v3.txt, > 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, > 14433v4.reapply.txt > > > HBASE-10449 upped our core count from 0 to 256 (max is 256). Looking in a > recent test run core dump, I see up to 256 threads per client and all are > idle. At a minimum it makes it hard reading test thread dumps. Trying to > learn more about why we went a core of 256 over in HBASE-10449. Meantime will > try setting down configs for test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager
[ https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791135#comment-14791135 ] stack commented on HBASE-10449: --- That makes sense. What happens if query happens if query every second: i.e. so there are periods when we have more queries than coreSize? Do the > coreSize query go in queue or do we make new threads to handle them? If latter, good, if former bad. Let me look at other issue. > Wrong execution pool configuration in HConnectionManager > > > Key: HBASE-10449 > URL: https://issues.apache.org/jira/browse/HBASE-10449 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 0.98.0, 0.99.0, 0.96.1.1 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Critical > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: HBASE-10449.v1.patch > > > There is a confusion in the configuration of the pool. The attached patch > fixes this. This may change the client performances, as we were using a > single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13250) chown of ExportSnapshot does not cover all path and files
[ https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-13250: --- Fix Version/s: (was: 0.98.15) > chown of ExportSnapshot does not cover all path and files > - > > Key: HBASE-13250 > URL: https://issues.apache.org/jira/browse/HBASE-13250 > Project: HBase > Issue Type: Bug >Reporter: He Liangliang >Assignee: He Liangliang >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3 > > Attachments: HBASE-13250-V0.patch > > > The chuser/chgroup function only covers the leaf hfile. The ownership of > hfile parent paths and snapshot reference files are not changed as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HBASE-13250) chown of ExportSnapshot does not cover all path and files
[ https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reopened HBASE-13250: Reverted from 0.98 due to compilation error against hadoop-1 profile > chown of ExportSnapshot does not cover all path and files > - > > Key: HBASE-13250 > URL: https://issues.apache.org/jira/browse/HBASE-13250 > Project: HBase > Issue Type: Bug >Reporter: He Liangliang >Assignee: He Liangliang >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3 > > Attachments: HBASE-13250-V0.patch > > > The chuser/chgroup function only covers the leaf hfile. The ownership of > hfile parent paths and snapshot reference files are not changed as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager
[ https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791129#comment-14791129 ] Nicolas Liochon commented on HBASE-10449: - The algo for the ThreadPoolExecutor is: onNewTask(){ if (currentSize < coreSize) createNewThread() else reuseThread() } And there is a timeout for each thread. So if we do a coreSize of 2, a time of 20s, and a query every 15s, we have: 0s query1: create thread1, poolSize=1 15s query2: create thread2, poolSize=2 20s close thread1, poolSize=1 30s query3: create thread3, poolSize=2 35s: close thread2, poolSize=1 45s: query4: create thread4, poolSize=2 And so on. So even if we have 1 query each 15s, we have 2 threads in the pool nearly all the time. > Yes. Smile. Need to revive it for here and for doing client timeouts I found the code in TestClientNoCluster#run , ready to be reused! I think we need to go for a hack like in Stackoverflow or for a different implementation for TPE like HBASE-11590... > Wrong execution pool configuration in HConnectionManager > > > Key: HBASE-10449 > URL: https://issues.apache.org/jira/browse/HBASE-10449 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 0.98.0, 0.99.0, 0.96.1.1 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Critical > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: HBASE-10449.v1.patch > > > There is a confusion in the configuration of the pool. The attached patch > fixes this. This may change the client performances, as we were using a > single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14433) Set down the client executor core thread count from 256 in tests
[ https://issues.apache.org/jira/browse/HBASE-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791127#comment-14791127 ] Hudson commented on HBASE-14433: FAILURE: Integrated in HBase-1.3 #180 (See [https://builds.apache.org/job/HBase-1.3/180/]) HBASE-14433 Set down the client executor core thread count from 256 in tests: REAPPLY AGAIN (WAS MISSING JIRA) (stack: rev 82554e275017bf1eb941a3b3c3145f5c2516cf54) * hbase-server/src/test/resources/hbase-site.xml * hbase-client/src/test/resources/hbase-site.xml > Set down the client executor core thread count from 256 in tests > > > Key: HBASE-14433 > URL: https://issues.apache.org/jira/browse/HBASE-14433 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14433 (1).txt, 14433.txt, 14433v2.txt, 14433v3.txt, > 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, > 14433v4.reapply.txt > > > HBASE-10449 upped our core count from 0 to 256 (max is 256). Looking in a > recent test run core dump, I see up to 256 threads per client and all are > idle. At a minimum it makes it hard reading test thread dumps. Trying to > learn more about why we went a core of 256 over in HBASE-10449. Meantime will > try setting down configs for test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13250) chown of ExportSnapshot does not cover all path and files
[ https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791120#comment-14791120 ] Hudson commented on HBASE-13250: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1077 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1077/]) HBASE-13250 chown of ExportSnapshot does not cover all path and files (He Liangliang) (tedyu: rev bcd986e47b8d633c996c8a2040c2a40b32cb5c59) * hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java > chown of ExportSnapshot does not cover all path and files > - > > Key: HBASE-13250 > URL: https://issues.apache.org/jira/browse/HBASE-13250 > Project: HBase > Issue Type: Bug >Reporter: He Liangliang >Assignee: He Liangliang >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: HBASE-13250-V0.patch > > > The chuser/chgroup function only covers the leaf hfile. The ownership of > hfile parent paths and snapshot reference files are not changed as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14128) Fix inability to run Multiple MR over the same Snapshot
[ https://issues.apache.org/jira/browse/HBASE-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791074#comment-14791074 ] Hadoop QA commented on HBASE-14128: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12756303/HBASE-14128-v0.patch against master branch at commit bd26386dc7205c9b30b8488bc094bd380ec09adb. ATTACHMENT ID: 12756303 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1837 checkstyle errors (more than the master's current 1835 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportExport org.apache.hadoop.hbase.util.TestProcessBasedCluster org.apache.hadoop.hbase.regionserver.TestWALLockup Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15625//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15625//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15625//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15625//console This message is automatically generated. > Fix inability to run Multiple MR over the same Snapshot > --- > > Key: HBASE-14128 > URL: https://issues.apache.org/jira/browse/HBASE-14128 > Project: HBase > Issue Type: Bug > Components: mapreduce, snapshots >Reporter: Matteo Bertozzi >Assignee: santosh kumar >Priority: Minor > Labels: beginner, noob > Attachments: HBASE-14128-v0.patch > > > from the list, running multiple MR over the same snapshot does not work > {code} > public static void copySnapshotForScanner(Configuration conf, FileSystem .. > RestoreSnapshotHelper helper = new RestoreSnapshotHelper(conf, fs, > manifest, manifest.getTableDescriptor(), restoreDir, monitor, status); > {code} > the problem is that manifest.getTableDescriptor() will try to clone the > snapshot with the same target name. ending up in "file already exist" > exceptions. > we just need to clone that descriptor and generate a new target table name -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14352) Replication is terribly slow with WAL compression
[ https://issues.apache.org/jira/browse/HBASE-14352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791070#comment-14791070 ] Andrew Purtell commented on HBASE-14352: When I've tested wal compression I've found the hit to write performance (increased latency leading to a lower aggregate write ceiling cluster-wide) to outweigh space savings and any gains from that. Is this the general experience? Maybe the answer is to deprecate WAL compression? > Replication is terribly slow with WAL compression > - > > Key: HBASE-14352 > URL: https://issues.apache.org/jira/browse/HBASE-14352 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.13 >Reporter: Abhishek Singh Chouhan > Attachments: age_of_last_shipped.png, size_of_log_queue.png > > > For the same load, replication with WAL compression enabled is almost 6x > slower than with compression turned off. Age of last shipped operation is > also correspondingly much higher when compression is turned on. > By observing Size of log queue we can see that it is taking too much time for > the queue to clear up. > Attaching corresponding graphs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13250) chown of ExportSnapshot does not cover all path and files
[ https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-13250: --- Hadoop Flags: Reviewed Fix Version/s: 1.1.3 1.0.3 0.98.15 1.3.0 1.2.0 2.0.0 > chown of ExportSnapshot does not cover all path and files > - > > Key: HBASE-13250 > URL: https://issues.apache.org/jira/browse/HBASE-13250 > Project: HBase > Issue Type: Bug >Reporter: He Liangliang >Assignee: He Liangliang >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: HBASE-13250-V0.patch > > > The chuser/chgroup function only covers the leaf hfile. The ownership of > hfile parent paths and snapshot reference files are not changed as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13250) chown of ExportSnapshot does not cover all path and files
[ https://issues.apache.org/jira/browse/HBASE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-13250: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the patch, Liangliang. > chown of ExportSnapshot does not cover all path and files > - > > Key: HBASE-13250 > URL: https://issues.apache.org/jira/browse/HBASE-13250 > Project: HBase > Issue Type: Bug >Reporter: He Liangliang >Assignee: He Liangliang >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: HBASE-13250-V0.patch > > > The chuser/chgroup function only covers the leaf hfile. The ownership of > hfile parent paths and snapshot reference files are not changed as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791053#comment-14791053 ] Samir Ahmic commented on HBASE-14431: - This is interesting. I have run TestFastFail several times on two different machines and test never fails. I was using java 1.7.0_80 and 1.7.0_71 - > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Attachments: HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14352) Replication is terribly slow with WAL compression
[ https://issues.apache.org/jira/browse/HBASE-14352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791052#comment-14791052 ] Lars Hofhansl commented on HBASE-14352: --- I took a look at the code some weeks back. The problem immediately jumps out... At the source we constantly reset the read position into the current WAL. With compression it means we have start from a point where the compression dictionary is written. That is very expensive. We have to do that in order to be sure we'll see the edits in the current block being written. So I don't see immediately a way out of it. Perhaps we simply tail until we reach the end of a file. And that case we'll try one more time with a reset, and only declare the WAL done when that is done. > Replication is terribly slow with WAL compression > - > > Key: HBASE-14352 > URL: https://issues.apache.org/jira/browse/HBASE-14352 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.13 >Reporter: Abhishek Singh Chouhan > Attachments: age_of_last_shipped.png, size_of_log_queue.png > > > For the same load, replication with WAL compression enabled is almost 6x > slower than with compression turned off. Age of last shipped operation is > also correspondingly much higher when compression is turned on. > By observing Size of log queue we can see that it is taking too much time for > the queue to clear up. > Attaching corresponding graphs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14433) Set down the client executor core thread count from 256 in tests
[ https://issues.apache.org/jira/browse/HBASE-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791038#comment-14791038 ] Hudson commented on HBASE-14433: FAILURE: Integrated in HBase-TRUNK #6814 (See [https://builds.apache.org/job/HBase-TRUNK/6814/]) HBASE-14433 Set down the client executor core thread count from 256 in tests: REAPPLY AGAIN (WAS MISSING JIRA) (stack: rev bd26386dc7205c9b30b8488bc094bd380ec09adb) * hbase-server/src/test/resources/hbase-site.xml * hbase-client/src/test/resources/hbase-site.xml > Set down the client executor core thread count from 256 in tests > > > Key: HBASE-14433 > URL: https://issues.apache.org/jira/browse/HBASE-14433 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14433 (1).txt, 14433.txt, 14433v2.txt, 14433v3.txt, > 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, > 14433v4.reapply.txt > > > HBASE-10449 upped our core count from 0 to 256 (max is 256). Looking in a > recent test run core dump, I see up to 256 threads per client and all are > idle. At a minimum it makes it hard reading test thread dumps. Trying to > learn more about why we went a core of 256 over in HBASE-10449. Meantime will > try setting down configs for test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14334) Move Memcached block cache in to it's own optional module.
[ https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-14334: -- Attachment: HBASE-14334-v1.patch Patch with a better description. > Move Memcached block cache in to it's own optional module. > -- > > Key: HBASE-14334 > URL: https://issues.apache.org/jira/browse/HBASE-14334 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0 > > Attachments: HBASE-14334-v1.patch, HBASE-14334.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14445) ExportSnapshot does not honor -chuser, -chgroup, -chmod options
[ https://issues.apache.org/jira/browse/HBASE-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-14445. Resolution: Duplicate > ExportSnapshot does not honor -chuser, -chgroup, -chmod options > --- > > Key: HBASE-14445 > URL: https://issues.apache.org/jira/browse/HBASE-14445 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.4 >Reporter: Ted Yu > > Create a snapshot of an existing HBase table, export the snapshot using the > -chuser, -chgroup, -chmod options. > Look in hdfs filesystem for export. The files do not have the correct > ownership, group, permissions > Thanks to Ian Roberts who first reported the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14334) Move Memcached block cache in to it's own optional module.
[ https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790979#comment-14790979 ] Elliott Clark commented on HBASE-14334: --- bq.The above is all the doc I'd see this module getting so say something about when it'd be used and how to enable it. I'm still hoping to provide better. You know how that goes though. > Move Memcached block cache in to it's own optional module. > -- > > Key: HBASE-14334 > URL: https://issues.apache.org/jira/browse/HBASE-14334 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0 > > Attachments: HBASE-14334-v1.patch, HBASE-14334.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14433) Set down the client executor core thread count from 256 in tests
[ https://issues.apache.org/jira/browse/HBASE-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790967#comment-14790967 ] Hudson commented on HBASE-14433: FAILURE: Integrated in HBase-TRUNK #6813 (See [https://builds.apache.org/job/HBase-TRUNK/6813/]) Revert "HBASE-14433 Set down the client executor core thread count from 256 to number of processors" (stack: rev 8633b26ee5095e82a9792a86dc5c95a4cf23f858) * hbase-client/src/test/resources/hbase-site.xml * hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java * hbase-server/src/test/resources/hbase-site.xml > Set down the client executor core thread count from 256 in tests > > > Key: HBASE-14433 > URL: https://issues.apache.org/jira/browse/HBASE-14433 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14433 (1).txt, 14433.txt, 14433v2.txt, 14433v3.txt, > 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, > 14433v4.reapply.txt > > > HBASE-10449 upped our core count from 0 to 256 (max is 256). Looking in a > recent test run core dump, I see up to 256 threads per client and all are > idle. At a minimum it makes it hard reading test thread dumps. Trying to > learn more about why we went a core of 256 over in HBASE-10449. Meantime will > try setting down configs for test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14445) ExportSnapshot does not honor -chuser, -chgroup, -chmod options
[ https://issues.apache.org/jira/browse/HBASE-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790963#comment-14790963 ] Matteo Bertozzi commented on HBASE-14445: - isn't this the same as HBASE-13250? > ExportSnapshot does not honor -chuser, -chgroup, -chmod options > --- > > Key: HBASE-14445 > URL: https://issues.apache.org/jira/browse/HBASE-14445 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.4 >Reporter: Ted Yu > > Create a snapshot of an existing HBase table, export the snapshot using the > -chuser, -chgroup, -chmod options. > Look in hdfs filesystem for export. The files do not have the correct > ownership, group, permissions > Thanks to Ian Roberts who first reported the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14445) ExportSnapshot does not honor -chuser, -chgroup, -chmod options
Ted Yu created HBASE-14445: -- Summary: ExportSnapshot does not honor -chuser, -chgroup, -chmod options Key: HBASE-14445 URL: https://issues.apache.org/jira/browse/HBASE-14445 Project: HBase Issue Type: Bug Affects Versions: 0.98.4 Reporter: Ted Yu Create a snapshot of an existing HBase table, export the snapshot using the -chuser, -chgroup, -chmod options. Look in hdfs filesystem for export. The files do not have the correct ownership, group, permissions Thanks to Ian Roberts who first reported the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14352) Replication is terribly slow with WAL compression
[ https://issues.apache.org/jira/browse/HBASE-14352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790911#comment-14790911 ] Abhishek Singh Chouhan commented on HBASE-14352: Yep...both of them had compression enabled. > Replication is terribly slow with WAL compression > - > > Key: HBASE-14352 > URL: https://issues.apache.org/jira/browse/HBASE-14352 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.13 >Reporter: Abhishek Singh Chouhan > Attachments: age_of_last_shipped.png, size_of_log_queue.png > > > For the same load, replication with WAL compression enabled is almost 6x > slower than with compression turned off. Age of last shipped operation is > also correspondingly much higher when compression is turned on. > By observing Size of log queue we can see that it is taking too much time for > the queue to clear up. > Attaching corresponding graphs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14443) Add request parameter to the TooSlow/TooLarge warn message of RpcServer
[ https://issues.apache.org/jira/browse/HBASE-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790841#comment-14790841 ] Nick Dimiduk commented on HBASE-14443: -- Agreed. Anything will help here. Also, HBASE-14333. > Add request parameter to the TooSlow/TooLarge warn message of RpcServer > --- > > Key: HBASE-14443 > URL: https://issues.apache.org/jira/browse/HBASE-14443 > Project: HBase > Issue Type: Improvement > Components: rpc >Reporter: Jianwei Cui >Priority: Minor > Fix For: 1.2.1 > > > The RpcServer will log a warn message for TooSlow or TooLarge request as: > {code} > logResponse(new Object[]{param}, > md.getName(), md.getName() + "(" + param.getClass().getName() + > ")", > (tooLarge ? "TooLarge" : "TooSlow"), > status.getClient(), startTime, processingTime, qTime, > responseSize); > {code} > The RpcServer#logResponse will create the warn message as: > {code} > if (params.length == 2 && server instanceof HRegionServer && > params[0] instanceof byte[] && > params[1] instanceof Operation) { > ... > responseInfo.putAll(((Operation) params[1]).toMap()); > ... > } else if (params.length == 1 && server instanceof HRegionServer && > params[0] instanceof Operation) { > ... > responseInfo.putAll(((Operation) params[0]).toMap()); > ... > } else { > ... > } > {code} > Because the parameter is always a protobuf message, not an instance of > Operation, the request parameter will not be added into the warn message. The > parameter is helpful to find out the problem, for example, knowing the > startRow/endRow is useful for a TooSlow scan. To improve the warn message, we > can transform the protobuf request message to corresponding Operation > subclass object by ProtobufUtil, so that it can be added the warn message. > Suggestion and discussion are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14442) MultiTableInputFormatBase.getSplits dosenot build split for a scan whose startRow=stopRow=(startRow of a region)
[ https://issues.apache.org/jira/browse/HBASE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790833#comment-14790833 ] Nick Dimiduk commented on HBASE-14442: -- Hi Nathan, can you provide a unit test that demonstrates this bug? See https://github.com/apache/hbase/blob/master/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMultiTableInputFormat.java for existing tests. > MultiTableInputFormatBase.getSplits dosenot build split for a scan whose > startRow=stopRow=(startRow of a region) > > > Key: HBASE-14442 > URL: https://issues.apache.org/jira/browse/HBASE-14442 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.1.2 >Reporter: Nathan >Assignee: Nathan > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > I created a Scan whose startRow and stopRow are the same with a region's > startRow, then I found no map was built. > The following is the source code of this condtion: > (startRow.length == 0 || keys.getSecond()[i].length == 0 || > Bytes.compareTo(startRow, keys.getSecond()[i]) < 0) && > (stopRow.length == 0 || Bytes.compareTo(stopRow, > keys.getFirst()[i]) > 0) > I think a "=" should be added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790813#comment-14790813 ] Hadoop QA commented on HBASE-14431: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12756275/HBASE-14431.patch against master branch at commit d2e338181800ae3cef55ddca491901b65259dc7f. ATTACHMENT ID: 12756275 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestFastFail Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15624//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15624//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15624//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15624//console This message is automatically generated. > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Attachments: HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14128) Fix inability to run Multiple MR over the same Snapshot
[ https://issues.apache.org/jira/browse/HBASE-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-14128: Status: Patch Available (was: Open) > Fix inability to run Multiple MR over the same Snapshot > --- > > Key: HBASE-14128 > URL: https://issues.apache.org/jira/browse/HBASE-14128 > Project: HBase > Issue Type: Bug > Components: mapreduce, snapshots >Reporter: Matteo Bertozzi >Assignee: santosh kumar >Priority: Minor > Labels: beginner, noob > Attachments: HBASE-14128-v0.patch > > > from the list, running multiple MR over the same snapshot does not work > {code} > public static void copySnapshotForScanner(Configuration conf, FileSystem .. > RestoreSnapshotHelper helper = new RestoreSnapshotHelper(conf, fs, > manifest, manifest.getTableDescriptor(), restoreDir, monitor, status); > {code} > the problem is that manifest.getTableDescriptor() will try to clone the > snapshot with the same target name. ending up in "file already exist" > exceptions. > we just need to clone that descriptor and generate a new target table name -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14128) Fix inability to run Multiple MR over the same Snapshot
[ https://issues.apache.org/jira/browse/HBASE-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-14128: Attachment: HBASE-14128-v0.patch This was much harder than I thought... Attached a v0 which should solve the htd problem. but the code can (and should) be cleaned up more. TableSnapshotScanner was the easiest one to cleanup, I removed the double snapshot manifest read and the use of the wrong HRI (the one from snapshot instead of restore). The SnapshotFormatImpl still have the double manifest reading.. but I fixed in a hacky way the wrong HRI use. The problem here is that RestoreHelper is called in setInput() and I don't see an easy way to pass the list of HRI that we have there to the Split method.. > Fix inability to run Multiple MR over the same Snapshot > --- > > Key: HBASE-14128 > URL: https://issues.apache.org/jira/browse/HBASE-14128 > Project: HBase > Issue Type: Bug > Components: mapreduce, snapshots >Reporter: Matteo Bertozzi >Assignee: santosh kumar >Priority: Minor > Labels: beginner, noob > Attachments: HBASE-14128-v0.patch > > > from the list, running multiple MR over the same snapshot does not work > {code} > public static void copySnapshotForScanner(Configuration conf, FileSystem .. > RestoreSnapshotHelper helper = new RestoreSnapshotHelper(conf, fs, > manifest, manifest.getTableDescriptor(), restoreDir, monitor, status); > {code} > the problem is that manifest.getTableDescriptor() will try to clone the > snapshot with the same target name. ending up in "file already exist" > exceptions. > we just need to clone that descriptor and generate a new target table name -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14411) Fix unit test failures when using multiwal as default WAL provider
[ https://issues.apache.org/jira/browse/HBASE-14411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790758#comment-14790758 ] Ted Yu commented on HBASE-14411: TestWALLockup passed here: https://builds.apache.org/job/HBase-1.3/jdk=latest1.7,label=Hadoop/178/console > Fix unit test failures when using multiwal as default WAL provider > -- > > Key: HBASE-14411 > URL: https://issues.apache.org/jira/browse/HBASE-14411 > Project: HBase > Issue Type: Bug >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14411.branch-1.patch, HBASE-14411.patch, > HBASE-14411_v2.patch > > > If we set hbase.wal.provider to multiwal in > hbase-server/src/test/resources/hbase-site.xml which allows us to use > BoundedRegionGroupingProvider in UT, we will observe below failures in > current code base: > {noformat} > Failed tests: > TestHLogRecordReader>TestWALRecordReader.testPartialRead:164 expected:<1> > but was:<2> > TestHLogRecordReader>TestWALRecordReader.testWALRecordReader:216 > expected:<2> but was:<3> > TestWALRecordReader.testPartialRead:164 expected:<1> but was:<2> > TestWALRecordReader.testWALRecordReader:216 expected:<2> but was:<3> > TestDistributedLogSplitting.testRecoveredEdits:276 edits dir should have > more than a single file in it. instead has 1 > TestAtomicOperation.testMultiRowMutationMultiThreads:499 expected:<0> but > was:<1> > TestHRegionServerBulkLoad.testAtomicBulkLoad:307 > Expected: is > but: was > TestLogRolling.testCompactionRecordDoesntBlockRolling:611 Should have WAL; > one table is not flushed expected:<1> but was:<0> > TestLogRolling.testLogRollOnDatanodeDeath:359 null > TestLogRolling.testLogRollOnPipelineRestart:472 Missing datanode should've > triggered a log roll > TestReplicationSourceManager.testLogRoll:237 expected:<6> but was:<7> > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594 if > skip.errors is false all files should remain in place expected:<11> but > was:<12> > TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong number of files in the > archive log expected:<11> but was:<12> > TestWALSplit.testMovedWALDuringRecovery:810->retryOverHdfsProblem:793 > expected:<11> but was:<12> > TestWALSplit.testRetryOpenDuringRecovery:838->retryOverHdfsProblem:793 > expected:<11> but was:<12> > > TestWALSplitCompressed>TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594 > if skip.errors is false all files should remain in place expected:<11> but > was:<12> > TestWALSplitCompressed>TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong > number of files in the archive log expected:<11> but was:<12> > > TestWALSplitCompressed>TestWALSplit.testMovedWALDuringRecovery:810->TestWALSplit.retryOverHdfsProblem:793 > expected:<11> but was:<12> > > TestWALSplitCompressed>TestWALSplit.testRetryOpenDuringRecovery:838->TestWALSplit.retryOverHdfsProblem:793 > expected:<11> but was:<12> > {noformat} > While patch for HBASE-14306 could resolve failures of TestHLogRecordReader, > TestReplicationSourceManager and TestReplicationWALReaderManager, this JIRA > will focus on resolving the others -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14411) Fix unit test failures when using multiwal as default WAL provider
[ https://issues.apache.org/jira/browse/HBASE-14411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790751#comment-14790751 ] Ted Yu commented on HBASE-14411: Did you mean TestWALLockup ? I ran it locally before committing the patch - it passed. This patch doesn't change default wal provider to multiwal. So the test failure was not related. > Fix unit test failures when using multiwal as default WAL provider > -- > > Key: HBASE-14411 > URL: https://issues.apache.org/jira/browse/HBASE-14411 > Project: HBase > Issue Type: Bug >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14411.branch-1.patch, HBASE-14411.patch, > HBASE-14411_v2.patch > > > If we set hbase.wal.provider to multiwal in > hbase-server/src/test/resources/hbase-site.xml which allows us to use > BoundedRegionGroupingProvider in UT, we will observe below failures in > current code base: > {noformat} > Failed tests: > TestHLogRecordReader>TestWALRecordReader.testPartialRead:164 expected:<1> > but was:<2> > TestHLogRecordReader>TestWALRecordReader.testWALRecordReader:216 > expected:<2> but was:<3> > TestWALRecordReader.testPartialRead:164 expected:<1> but was:<2> > TestWALRecordReader.testWALRecordReader:216 expected:<2> but was:<3> > TestDistributedLogSplitting.testRecoveredEdits:276 edits dir should have > more than a single file in it. instead has 1 > TestAtomicOperation.testMultiRowMutationMultiThreads:499 expected:<0> but > was:<1> > TestHRegionServerBulkLoad.testAtomicBulkLoad:307 > Expected: is > but: was > TestLogRolling.testCompactionRecordDoesntBlockRolling:611 Should have WAL; > one table is not flushed expected:<1> but was:<0> > TestLogRolling.testLogRollOnDatanodeDeath:359 null > TestLogRolling.testLogRollOnPipelineRestart:472 Missing datanode should've > triggered a log roll > TestReplicationSourceManager.testLogRoll:237 expected:<6> but was:<7> > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594 if > skip.errors is false all files should remain in place expected:<11> but > was:<12> > TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong number of files in the > archive log expected:<11> but was:<12> > TestWALSplit.testMovedWALDuringRecovery:810->retryOverHdfsProblem:793 > expected:<11> but was:<12> > TestWALSplit.testRetryOpenDuringRecovery:838->retryOverHdfsProblem:793 > expected:<11> but was:<12> > > TestWALSplitCompressed>TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594 > if skip.errors is false all files should remain in place expected:<11> but > was:<12> > TestWALSplitCompressed>TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong > number of files in the archive log expected:<11> but was:<12> > > TestWALSplitCompressed>TestWALSplit.testMovedWALDuringRecovery:810->TestWALSplit.retryOverHdfsProblem:793 > expected:<11> but was:<12> > > TestWALSplitCompressed>TestWALSplit.testRetryOpenDuringRecovery:838->TestWALSplit.retryOverHdfsProblem:793 > expected:<11> but was:<12> > {noformat} > While patch for HBASE-14306 could resolve failures of TestHLogRecordReader, > TestReplicationSourceManager and TestReplicationWALReaderManager, this JIRA > will focus on resolving the others -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14411) Fix unit test failures when using multiwal as default WAL provider
[ https://issues.apache.org/jira/browse/HBASE-14411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790748#comment-14790748 ] Elliott Clark commented on HBASE-14411: --- That last test failure on Hadoop QA looks really related. > Fix unit test failures when using multiwal as default WAL provider > -- > > Key: HBASE-14411 > URL: https://issues.apache.org/jira/browse/HBASE-14411 > Project: HBase > Issue Type: Bug >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14411.branch-1.patch, HBASE-14411.patch, > HBASE-14411_v2.patch > > > If we set hbase.wal.provider to multiwal in > hbase-server/src/test/resources/hbase-site.xml which allows us to use > BoundedRegionGroupingProvider in UT, we will observe below failures in > current code base: > {noformat} > Failed tests: > TestHLogRecordReader>TestWALRecordReader.testPartialRead:164 expected:<1> > but was:<2> > TestHLogRecordReader>TestWALRecordReader.testWALRecordReader:216 > expected:<2> but was:<3> > TestWALRecordReader.testPartialRead:164 expected:<1> but was:<2> > TestWALRecordReader.testWALRecordReader:216 expected:<2> but was:<3> > TestDistributedLogSplitting.testRecoveredEdits:276 edits dir should have > more than a single file in it. instead has 1 > TestAtomicOperation.testMultiRowMutationMultiThreads:499 expected:<0> but > was:<1> > TestHRegionServerBulkLoad.testAtomicBulkLoad:307 > Expected: is > but: was > TestLogRolling.testCompactionRecordDoesntBlockRolling:611 Should have WAL; > one table is not flushed expected:<1> but was:<0> > TestLogRolling.testLogRollOnDatanodeDeath:359 null > TestLogRolling.testLogRollOnPipelineRestart:472 Missing datanode should've > triggered a log roll > TestReplicationSourceManager.testLogRoll:237 expected:<6> but was:<7> > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594 if > skip.errors is false all files should remain in place expected:<11> but > was:<12> > TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong number of files in the > archive log expected:<11> but was:<12> > TestWALSplit.testMovedWALDuringRecovery:810->retryOverHdfsProblem:793 > expected:<11> but was:<12> > TestWALSplit.testRetryOpenDuringRecovery:838->retryOverHdfsProblem:793 > expected:<11> but was:<12> > > TestWALSplitCompressed>TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594 > if skip.errors is false all files should remain in place expected:<11> but > was:<12> > TestWALSplitCompressed>TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong > number of files in the archive log expected:<11> but was:<12> > > TestWALSplitCompressed>TestWALSplit.testMovedWALDuringRecovery:810->TestWALSplit.retryOverHdfsProblem:793 > expected:<11> but was:<12> > > TestWALSplitCompressed>TestWALSplit.testRetryOpenDuringRecovery:838->TestWALSplit.retryOverHdfsProblem:793 > expected:<11> but was:<12> > {noformat} > While patch for HBASE-14306 could resolve failures of TestHLogRecordReader, > TestReplicationSourceManager and TestReplicationWALReaderManager, this JIRA > will focus on resolving the others -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14334) Move Memcached block cache in to it's own optional module.
[ https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790729#comment-14790729 ] stack commented on HBASE-14334: --- +1 On commit, add more to this: HBase external block cache. The above is all the doc I'd see this module getting so say something about when it'd be used and how to enable it. Replicate as the release note on this issue. > Move Memcached block cache in to it's own optional module. > -- > > Key: HBASE-14334 > URL: https://issues.apache.org/jira/browse/HBASE-14334 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0 > > Attachments: HBASE-14334.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14411) Fix unit test failures when using multiwal as default WAL provider
[ https://issues.apache.org/jira/browse/HBASE-14411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790724#comment-14790724 ] Hudson commented on HBASE-14411: SUCCESS: Integrated in HBase-1.3-IT #160 (See [https://builds.apache.org/job/HBase-1.3-IT/160/]) HBASE-14411 Fix unit test failures when using multiwal as default WAL provider (Yu Li) (tedyu: rev 0452ba09b53fb450c913811b77d74b6035b40ce3) * hbase-server/src/main/java/org/apache/hadoop/hbase/wal/DefaultWALProvider.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java * hbase-server/src/test/java/org/apache/hadoop/hbase/wal/TestWALSplit.java * hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportExport.java > Fix unit test failures when using multiwal as default WAL provider > -- > > Key: HBASE-14411 > URL: https://issues.apache.org/jira/browse/HBASE-14411 > Project: HBase > Issue Type: Bug >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14411.branch-1.patch, HBASE-14411.patch, > HBASE-14411_v2.patch > > > If we set hbase.wal.provider to multiwal in > hbase-server/src/test/resources/hbase-site.xml which allows us to use > BoundedRegionGroupingProvider in UT, we will observe below failures in > current code base: > {noformat} > Failed tests: > TestHLogRecordReader>TestWALRecordReader.testPartialRead:164 expected:<1> > but was:<2> > TestHLogRecordReader>TestWALRecordReader.testWALRecordReader:216 > expected:<2> but was:<3> > TestWALRecordReader.testPartialRead:164 expected:<1> but was:<2> > TestWALRecordReader.testWALRecordReader:216 expected:<2> but was:<3> > TestDistributedLogSplitting.testRecoveredEdits:276 edits dir should have > more than a single file in it. instead has 1 > TestAtomicOperation.testMultiRowMutationMultiThreads:499 expected:<0> but > was:<1> > TestHRegionServerBulkLoad.testAtomicBulkLoad:307 > Expected: is > but: was > TestLogRolling.testCompactionRecordDoesntBlockRolling:611 Should have WAL; > one table is not flushed expected:<1> but was:<0> > TestLogRolling.testLogRollOnDatanodeDeath:359 null > TestLogRolling.testLogRollOnPipelineRestart:472 Missing datanode should've > triggered a log roll > TestReplicationSourceManager.testLogRoll:237 expected:<6> but was:<7> > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594 if > skip.errors is false all files should remain in place expected:<11> but > was:<12> > TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong number of files in the > archive log expected:<11> but was:<12> > TestWALSplit.testMovedWALDuringRecovery:810->retryOverHdfsProblem:793 > expected:<11> but was:<12> > TestWALSplit.testRetryOpenDuringRecovery:838->retryOverHdfsProblem:793 > expected:<11> but was:<12> > > TestWALSplitCompressed>TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594 > if skip.errors is false all files should remain in place expected:<11> but > was:<12> > TestWALSplitCompressed>TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong > number of files in the archive log expected:<11> but was:<12> > > TestWALSplitCompressed>TestWALSplit.testMovedWALDuringRecovery:810->TestWALSplit.retryOverHdfsProblem:793 > expected:<11> but was:<12> > > TestWALSplitCompressed>TestWALSplit.testRetryOpenDuringRecovery:838->TestWALSplit.retryOverHdfsProblem:793 > expected:<11> but was:<12> > {noformat} > While patch for HBASE-14306 could resolve failures of TestHLogRecordReader, > TestReplicationSourceManager and TestReplicationWALReaderManager, this JIRA > will focus on resolving the others -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12751) Allow RowLock to be reader writer
[ https://issues.apache.org/jira/browse/HBASE-12751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790723#comment-14790723 ] stack commented on HBASE-12751: --- kalashnikov:hbase.git stack$ python ./dev-support/findHangingTests.py https://builds.apache.org/job/PreCommit-HBASE-Build/15616//consoleText Fetching the console output from the URL Printing hanging tests Hanging test : org.apache.hadoop.hbase.TestIOFencing Hanging test : org.apache.hadoop.hbase.master.TestDistributedLogSplitting Hanging test : org.apache.hadoop.hbase.master.procedure.TestServerCrashProcedure Hanging test : org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDistributedLogReplay Printing Failing tests Failing test : org.apache.hadoop.hbase.client.TestReplicasClient Let me look into these. > Allow RowLock to be reader writer > - > > Key: HBASE-12751 > URL: https://issues.apache.org/jira/browse/HBASE-12751 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0, 1.3.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.3.0 > > Attachments: 12751.rebased.v25.txt, 12751.rebased.v26.txt, > 12751.rebased.v26.txt, 12751.rebased.v27.txt, 12751.rebased.v29.txt, > 12751.rebased.v31.txt, 12751.rebased.v32.txt, 12751.rebased.v32.txt, > 12751.rebased.v33.txt, 12751.rebased.v34.txt, 12751.rebased.v35.txt, > 12751.rebased.v35.txt, 12751.rebased.v35.txt, 12751.v37.txt, 12751.v38.txt, > 12751v22.txt, 12751v23.txt, 12751v23.txt, 12751v23.txt, 12751v23.txt, > 12751v36.txt, HBASE-12751-v1.patch, HBASE-12751-v10.patch, > HBASE-12751-v10.patch, HBASE-12751-v11.patch, HBASE-12751-v12.patch, > HBASE-12751-v13.patch, HBASE-12751-v14.patch, HBASE-12751-v15.patch, > HBASE-12751-v16.patch, HBASE-12751-v17.patch, HBASE-12751-v18.patch, > HBASE-12751-v19 (1).patch, HBASE-12751-v19.patch, HBASE-12751-v2.patch, > HBASE-12751-v20.patch, HBASE-12751-v20.patch, HBASE-12751-v21.patch, > HBASE-12751-v3.patch, HBASE-12751-v4.patch, HBASE-12751-v5.patch, > HBASE-12751-v6.patch, HBASE-12751-v7.patch, HBASE-12751-v8.patch, > HBASE-12751-v9.patch, HBASE-12751.patch > > > Right now every write operation grabs a row lock. This is to prevent values > from changing during a read modify write operation (increment or check and > put). However it limits parallelism in several different scenarios. > If there are several puts to the same row but different columns or stores > then this is very limiting. > If there are puts to the same column then mvcc number should ensure a > consistent ordering. So locking is not needed. > However locking for check and put or increment is still needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14278) Fix NPE that is showing up since HBASE-14274 went in
[ https://issues.apache.org/jira/browse/HBASE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790710#comment-14790710 ] stack commented on HBASE-14278: --- kalashnikov:hbase.git.commit stack$ python dev-support/findHangingTests.py https://builds.apache.org/job/PreCommit-HBASE-Build/15617/consoleText Fetching the console output from the URL Printing hanging tests Printing Failing tests Failing test : org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Failing test : org.apache.hadoop.hbase.client.TestReplicaWithCluster TestReplicaWithCluster I see is showing up as a hang. I'll take a look. The other failure loooks unrelated. I'll look at that too. +1 on patch. This emission is ugly currently spewing all over test runs. Thanks [~eclark] On commit, shove e.getMessage on the end of this log just so we can be sure it that old faithful, the NPE: 76 } catch (Exception e) { 77// Ignored. If this errors out it means that someone is double 78// closing the region source and the region is already nulled out. 79LOG.info("Error trying to remove " + toRemove + " from " + this.getClass().getSimpleName()); 80 } > Fix NPE that is showing up since HBASE-14274 went in > > > Key: HBASE-14278 > URL: https://issues.apache.org/jira/browse/HBASE-14278 > Project: HBase > Issue Type: Sub-task > Components: test >Affects Versions: 2.0.0, 1.2.0, 1.3.0 >Reporter: stack >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14278-v1.patch, HBASE-14278-v2.patch, > HBASE-14278-v3.patch, HBASE-14278-v4.patch, HBASE-14278-v5.patch, > HBASE-14278.patch > > > Saw this in TestDistributedLogSplitting after HBASE-14274 was applied. > {code} > 119113 2015-08-20 15:31:10,704 WARN [HBase-Metrics2-1] > impl.MetricsConfig(124): Cannot locate configuration: tried > hadoop-metrics2-hbase.properties,hadoop-metrics2.properties > 119114 2015-08-20 15:31:10,710 ERROR [HBase-Metrics2-1] > lib.MethodMetric$2(118): Error invoking method getBlocksTotal > 119115 java.lang.reflect.InvocationTargetException > 119116 › at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source) > 119117 › at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 119118 › at java.lang.reflect.Method.invoke(Method.java:606) > 119119 › at > org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111) > 119120 › at > org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144) > 119121 › at > org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:387) > 119122 › at > org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79) > 119123 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) > 119124 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) > 119125 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) > 119126 › at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) > 119127 › at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) > 119128 › at > com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522) > 119129 › at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57) > 119130 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:221) > 119131 › at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:96) > 119132 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:245) > 119133 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:229) > 119134 › at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source) > 119135 › at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 119136 › at java.lang.reflect.Method.invoke(Method.java:606) > 119137 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:290) > 119138 › at com.sun.proxy.$Proxy13.postStart(Unknown Source) > 119139 › at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:185) > 119140 › at > org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:81) > 119141 › at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > 119142 › at java.util.concurrent.FutureTask.run
[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager
[ https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790698#comment-14790698 ] stack commented on HBASE-10449: --- bq. I expect that if we have more than coreSize calls in timeout (256 vs 60 seconds in our case) then we always have coreSize threads. Say again. I'm not following [~nkeywal] Thanks. bq. ...the protobuf nightmare if you remember Yes. Smile. Need to revive it for here and for doing client timeouts > Wrong execution pool configuration in HConnectionManager > > > Key: HBASE-10449 > URL: https://issues.apache.org/jira/browse/HBASE-10449 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 0.98.0, 0.99.0, 0.96.1.1 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Critical > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: HBASE-10449.v1.patch > > > There is a confusion in the configuration of the pool. The attached patch > fixes this. This may change the client performances, as we were using a > single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14221) Reduce the number of time row comparison is done in a Scan
[ https://issues.apache.org/jira/browse/HBASE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790692#comment-14790692 ] stack commented on HBASE-14221: --- bq. . But atleast for a single CF case I think these comparison can be reduced. How does this extend to the MultiCF case? So, about 10% difference for this added complexity? @larsh You are probably interested in this. Why need for two flags? Why not isSingleColumnFamily test not enough? When would we have a single store heap scanner but then a joined heap would have more than one? 5275// Indicates if the storeHeap is formed of only one StoreScanner 5276boolean singleStoreScannerHeap = false; 5277// Indicates if the joinedHeap is formed of only one StoreScanner. 5278boolean singleStoreScannerJoinedHeap = false; Why add a flag here? boolean moreValues = populateResult(results, this.joinedHeap, scannerContext, 5488 joinedContinuationRow); 5497 joinedContinuationRow, singleStoreScannerJoinedHeap); Why not just have the flag be in the scanner context? > Reduce the number of time row comparison is done in a Scan > -- > > Key: HBASE-14221 > URL: https://issues.apache.org/jira/browse/HBASE-14221 > Project: HBase > Issue Type: Sub-task > Components: Scanners >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: HBASE-14221.patch, HBASE-14221_1.patch, > HBASE-14221_1.patch, withmatchingRowspatch.png, withoutmatchingRowspatch.png > > > When we tried to do some profiling with the PE tool found this. > Currently we do row comparisons in 3 places in a simple Scan case. > 1) ScanQueryMatcher > {code} >int ret = this.rowComparator.compareRows(curCell, cell); > if (!this.isReversed) { > if (ret <= -1) { > return MatchCode.DONE; > } else if (ret >= 1) { > // could optimize this, if necessary? > // Could also be called SEEK_TO_CURRENT_ROW, but this > // should be rare/never happens. > return MatchCode.SEEK_NEXT_ROW; > } > } else { > if (ret <= -1) { > return MatchCode.SEEK_NEXT_ROW; > } else if (ret >= 1) { > return MatchCode.DONE; > } > } > {code} > 2) In StoreScanner next() while starting to scan the row > {code} > if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || > matcher.curCell == null || > isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) { > this.countPerRow = 0; > matcher.setToNewRow(peeked); > } > {code} > Particularly to see if we are in a new row. > 3) In HRegion > {code} > scannerContext.setKeepProgress(true); > heap.next(results, scannerContext); > scannerContext.setKeepProgress(tmpKeepProgress); > nextKv = heap.peek(); > moreCellsInRow = moreCellsInRow(nextKv, currentRowCell); > {code} > Here again there are cases where we need to careful for a MultiCF case. Was > trying to solve this for the MultiCF case but is having lot of cases to > solve. But atleast for a single CF case I think these comparison can be > reduced. > So for a single CF case in the SQM we are able to find if we have crossed a > row using the code pasted above in SQM. That comparison is definitely needed. > Now in case of a single CF the HRegion is going to have only one element in > the heap and so the 3rd comparison can surely be avoided if the > StoreScanner.next() was over due to MatchCode.DONE caused by SQM. > Coming to the 2nd compareRows that we do in StoreScanner. next() - even that > can be avoided if we know that the previous next() call was over due to a new > row. Doing all this I found that the compareRows in the profiler which was > 19% got reduced to 13%. Initially we can solve for single CF case which can > be extended to MultiCF cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14411) Fix unit test failures when using multiwal as default WAL provider
[ https://issues.apache.org/jira/browse/HBASE-14411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790684#comment-14790684 ] Hudson commented on HBASE-14411: FAILURE: Integrated in HBase-1.3 #178 (See [https://builds.apache.org/job/HBase-1.3/178/]) HBASE-14411 Fix unit test failures when using multiwal as default WAL provider (Yu Li) (tedyu: rev 0452ba09b53fb450c913811b77d74b6035b40ce3) * hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportExport.java * hbase-server/src/test/java/org/apache/hadoop/hbase/wal/TestWALSplit.java * hbase-server/src/main/java/org/apache/hadoop/hbase/wal/DefaultWALProvider.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java > Fix unit test failures when using multiwal as default WAL provider > -- > > Key: HBASE-14411 > URL: https://issues.apache.org/jira/browse/HBASE-14411 > Project: HBase > Issue Type: Bug >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14411.branch-1.patch, HBASE-14411.patch, > HBASE-14411_v2.patch > > > If we set hbase.wal.provider to multiwal in > hbase-server/src/test/resources/hbase-site.xml which allows us to use > BoundedRegionGroupingProvider in UT, we will observe below failures in > current code base: > {noformat} > Failed tests: > TestHLogRecordReader>TestWALRecordReader.testPartialRead:164 expected:<1> > but was:<2> > TestHLogRecordReader>TestWALRecordReader.testWALRecordReader:216 > expected:<2> but was:<3> > TestWALRecordReader.testPartialRead:164 expected:<1> but was:<2> > TestWALRecordReader.testWALRecordReader:216 expected:<2> but was:<3> > TestDistributedLogSplitting.testRecoveredEdits:276 edits dir should have > more than a single file in it. instead has 1 > TestAtomicOperation.testMultiRowMutationMultiThreads:499 expected:<0> but > was:<1> > TestHRegionServerBulkLoad.testAtomicBulkLoad:307 > Expected: is > but: was > TestLogRolling.testCompactionRecordDoesntBlockRolling:611 Should have WAL; > one table is not flushed expected:<1> but was:<0> > TestLogRolling.testLogRollOnDatanodeDeath:359 null > TestLogRolling.testLogRollOnPipelineRestart:472 Missing datanode should've > triggered a log roll > TestReplicationSourceManager.testLogRoll:237 expected:<6> but was:<7> > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestReplicationWALReaderManager.test:155 null > TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594 if > skip.errors is false all files should remain in place expected:<11> but > was:<12> > TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong number of files in the > archive log expected:<11> but was:<12> > TestWALSplit.testMovedWALDuringRecovery:810->retryOverHdfsProblem:793 > expected:<11> but was:<12> > TestWALSplit.testRetryOpenDuringRecovery:838->retryOverHdfsProblem:793 > expected:<11> but was:<12> > > TestWALSplitCompressed>TestWALSplit.testCorruptedLogFilesSkipErrorsFalseDoesNotTouchLogs:594 > if skip.errors is false all files should remain in place expected:<11> but > was:<12> > TestWALSplitCompressed>TestWALSplit.testLogsGetArchivedAfterSplit:649 wrong > number of files in the archive log expected:<11> but was:<12> > > TestWALSplitCompressed>TestWALSplit.testMovedWALDuringRecovery:810->TestWALSplit.retryOverHdfsProblem:793 > expected:<11> but was:<12> > > TestWALSplitCompressed>TestWALSplit.testRetryOpenDuringRecovery:838->TestWALSplit.retryOverHdfsProblem:793 > expected:<11> but was:<12> > {noformat} > While patch for HBASE-14306 could resolve failures of TestHLogRecordReader, > TestReplicationSourceManager and TestReplicationWALReaderManager, this JIRA > will focus on resolving the others -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager
[ https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790660#comment-14790660 ] Nicolas Liochon commented on HBASE-10449: - > I was thinking that we'd go to core size – say # of cores – and then if one > request a second, we'd just stay at core size because there would be a free > thread when the request-per-second came in (assuming request took a good deal > < a second). I expect that if we have more than coreSize calls in timeout (256 vs 60 seconds in our case) then we always have coreSize threads. > Didn't we have a mock server somewhere such that we could standup a client > with no friction and watch it in operation? I thought we'd make such a > beast Yep, you built one, we used it when we looked at the perf issues in the client (the protobuf nightmare if you remember ;:-)). > Wrong execution pool configuration in HConnectionManager > > > Key: HBASE-10449 > URL: https://issues.apache.org/jira/browse/HBASE-10449 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 0.98.0, 0.99.0, 0.96.1.1 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Critical > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: HBASE-10449.v1.patch > > > There is a confusion in the configuration of the pool. The attached patch > fixes this. This may change the client performances, as we were using a > single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14433) Set down the client executor core thread count from 256 in tests
[ https://issues.apache.org/jira/browse/HBASE-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14433: -- Release Note: Tests run with client executors that have core thread count of 4 and a keepalive of 3 seconds. They used to default to 256 core threads and 60 seconds for keepalive. (was: Change the client executor core thread count to be number of processors instead of 256: i.e. the equivalent of the maximum threads allowed on client. The config to set it back to 256 or any other value is "hbase.hconnection.threads.core". Also set it so core is set to default 4 threads in client core in tests (and keepalive is downed from a minute to 3 seconds).) Summary: Set down the client executor core thread count from 256 in tests (was: Set down the client executor core thread count from 256 to number of processors) > Set down the client executor core thread count from 256 in tests > > > Key: HBASE-14433 > URL: https://issues.apache.org/jira/browse/HBASE-14433 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0 > > Attachments: 14433 (1).txt, 14433.txt, 14433v2.txt, 14433v3.txt, > 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt > > > HBASE-10449 upped our core count from 0 to 256 (max is 256). Looking in a > recent test run core dump, I see up to 256 threads per client and all are > idle. At a minimum it makes it hard reading test thread dumps. Trying to > learn more about why we went a core of 256 over in HBASE-10449. Meantime will > try setting down configs for test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13770) Programmatic JAAS configuration option for secure zookeeper may be broken
[ https://issues.apache.org/jira/browse/HBASE-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790641#comment-14790641 ] Hadoop QA commented on HBASE-13770: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12756251/HBASE-13770-0.98.patch against 0.98 branch at commit d2e338181800ae3cef55ddca491901b65259dc7f. ATTACHMENT ID: 12756251 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 23 warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 3873 checkstyle errors (more than the master's current 3869 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + public static final String ZK_CLIENT_KERBEROS_PRINCIPLE = "hbase.zookeeper.client.kerberos.principal"; + public static final String ZK_SERVER_KERBEROS_PRINCIPLE = "hbase.zookeeper.server.kerberos.principal"; {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15623//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15623//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15623//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15623//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15623//console This message is automatically generated. > Programmatic JAAS configuration option for secure zookeeper may be broken > - > > Key: HBASE-13770 > URL: https://issues.apache.org/jira/browse/HBASE-13770 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.0.1, 1.1.0, 0.98.13, 1.2.0 >Reporter: Andrew Purtell >Assignee: Maddineni Sukumar > Fix For: 0.98.13 > > Attachments: HBASE-13770-0.98.patch, HBASE-13770-v1.patch, > HBASE-13770-v2.patch > > > While verifying the patch fix for HBASE-13768 we were unable to successfully > test the programmatic JAAS configuration option for secure ZooKeeper > integration. Unclear if that was due to a bug or incorrect test configuration. > Update the security section of the online book with clear instructions for > setting up the programmatic JAAS configuration option for secure ZooKeeper > integration. > Verify it works. > Fix as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14433) Set down the client executor core thread count from 256 in tests
[ https://issues.apache.org/jira/browse/HBASE-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14433: -- Fix Version/s: 1.3.0 1.2.0 > Set down the client executor core thread count from 256 in tests > > > Key: HBASE-14433 > URL: https://issues.apache.org/jira/browse/HBASE-14433 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14433 (1).txt, 14433.txt, 14433v2.txt, 14433v3.txt, > 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, > 14433v4.reapply.txt > > > HBASE-10449 upped our core count from 0 to 256 (max is 256). Looking in a > recent test run core dump, I see up to 256 threads per client and all are > idle. At a minimum it makes it hard reading test thread dumps. Trying to > learn more about why we went a core of 256 over in HBASE-10449. Meantime will > try setting down configs for test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14433) Set down the client executor core thread count from 256 in tests
[ https://issues.apache.org/jira/browse/HBASE-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14433: -- Attachment: 14433v4.reapply.txt Here is what I reapplied under the rubric of this issue. It just changes the config for tests. I applied to 1.2+. > Set down the client executor core thread count from 256 in tests > > > Key: HBASE-14433 > URL: https://issues.apache.org/jira/browse/HBASE-14433 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14433 (1).txt, 14433.txt, 14433v2.txt, 14433v3.txt, > 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, > 14433v4.reapply.txt > > > HBASE-10449 upped our core count from 0 to 256 (max is 256). Looking in a > recent test run core dump, I see up to 256 threads per client and all are > idle. At a minimum it makes it hard reading test thread dumps. Trying to > learn more about why we went a core of 256 over in HBASE-10449. Meantime will > try setting down configs for test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14433) Set down the client executor core thread count from 256 to number of processors
[ https://issues.apache.org/jira/browse/HBASE-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790588#comment-14790588 ] stack commented on HBASE-14433: --- Ok. Reverting the patch I applied last night because discussion ongoing over in HBASE-10449. I'm instead going to just set limits for tests only. > Set down the client executor core thread count from 256 to number of > processors > --- > > Key: HBASE-14433 > URL: https://issues.apache.org/jira/browse/HBASE-14433 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0 > > Attachments: 14433 (1).txt, 14433.txt, 14433v2.txt, 14433v3.txt, > 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt, 14433v3.txt > > > HBASE-10449 upped our core count from 0 to 256 (max is 256). Looking in a > recent test run core dump, I see up to 256 threads per client and all are > idle. At a minimum it makes it hard reading test thread dumps. Trying to > learn more about why we went a core of 256 over in HBASE-10449. Meantime will > try setting down configs for test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager
[ https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790587#comment-14790587 ] stack commented on HBASE-10449: --- Thanks [~nkeywal] bq. We should not see 256 threads, because they should expire already Maybe they spin up inside the keepalive time of 60 seconds. bq. We will still have 60 threads, because each new request will create a new thread until we reach coreSize Well, I was thinking that we'd go to core size -- say # of cores -- and then if one request a second, we'd just stay at core size because there would be a free thread when the request-per-second came in (assuming request took a good deal < a second). Let me look at HBASE-11590. What I saw was each client with hundreds -- up to 256 on one -- threads all in WAITING like follows: {code} "hconnection-0x3065a6a9-shared--pool13-t247" daemon prio=10 tid=0x7f31c1ab2000 nid=0x7718 waiting on condition [0x7f2f9ecec000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0007f841b388> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} ... usually in TestReplicasClient. Here is example: https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/15581/consoleText See zombies on the end. I also have second thoughts on HBASE-114433. I am going to change it so we set config for tests only. We need to do more work before can set the core threads down from max is what I am thinking. Thanks [~nkeywal] I'll look at HBASE-11590. Didn't we have a mock server somewhere such that we could standup a client with no friction and watch it in operation? I thought we'd make such a beast > Wrong execution pool configuration in HConnectionManager > > > Key: HBASE-10449 > URL: https://issues.apache.org/jira/browse/HBASE-10449 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 0.98.0, 0.99.0, 0.96.1.1 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Critical > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: HBASE-10449.v1.patch > > > There is a confusion in the configuration of the pool. The attached patch > fixes this. This may change the client performances, as we were using a > single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14443) Add request parameter to the TooSlow/TooLarge warn message of RpcServer
[ https://issues.apache.org/jira/browse/HBASE-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790528#comment-14790528 ] stack commented on HBASE-14443: --- Anything to make this stuff more useful is welcome (+1 on transform) > Add request parameter to the TooSlow/TooLarge warn message of RpcServer > --- > > Key: HBASE-14443 > URL: https://issues.apache.org/jira/browse/HBASE-14443 > Project: HBase > Issue Type: Improvement > Components: rpc >Reporter: Jianwei Cui >Priority: Minor > Fix For: 1.2.1 > > > The RpcServer will log a warn message for TooSlow or TooLarge request as: > {code} > logResponse(new Object[]{param}, > md.getName(), md.getName() + "(" + param.getClass().getName() + > ")", > (tooLarge ? "TooLarge" : "TooSlow"), > status.getClient(), startTime, processingTime, qTime, > responseSize); > {code} > The RpcServer#logResponse will create the warn message as: > {code} > if (params.length == 2 && server instanceof HRegionServer && > params[0] instanceof byte[] && > params[1] instanceof Operation) { > ... > responseInfo.putAll(((Operation) params[1]).toMap()); > ... > } else if (params.length == 1 && server instanceof HRegionServer && > params[0] instanceof Operation) { > ... > responseInfo.putAll(((Operation) params[0]).toMap()); > ... > } else { > ... > } > {code} > Because the parameter is always a protobuf message, not an instance of > Operation, the request parameter will not be added into the warn message. The > parameter is helpful to find out the problem, for example, knowing the > startRow/endRow is useful for a TooSlow scan. To improve the warn message, we > can transform the protobuf request message to corresponding Operation > subclass object by ProtobufUtil, so that it can be added the warn message. > Suggestion and discussion are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790514#comment-14790514 ] Ted Yu commented on HBASE-14431: lgtm nit: connection.hashCode() is computed twice. You can save the return value in a local variable. > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Attachments: HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14431) AsyncRpcClient#removeConnection() never removes connection from connections pool if server fails
[ https://issues.apache.org/jira/browse/HBASE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samir Ahmic updated HBASE-14431: Attachment: HBASE-14431.patch Here is patch fixing this issue. I have notice that we have some 50s pause in client between detecting that session has been reset (killing rs) and removing connection to this server from connections pool. I will probably open new ticket addressing this issue when i dig more info why this pause is so long > AsyncRpcClient#removeConnection() never removes connection from connections > pool if server fails > > > Key: HBASE-14431 > URL: https://issues.apache.org/jira/browse/HBASE-14431 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2, 1.1.2 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Critical > Attachments: HBASE-14431.patch > > > I was playing with master branch in distributed mode (3 rs + master + > backup_master) and notice strange behavior when i was testing this sequence > of events on single rs: /kill/start/run_balancer while client was writing > data to cluster (LoadTestTool). > I have notice that LTT fails with following: > {code} > 2015-09-09 11:05:58,364 INFO [main] client.AsyncProcess: #2, waiting for > some tasks to finish. Expected max=0, tasksInProgress=35 > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 > action: BindException: 1 time, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:208) > at > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1697) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:211) > {code} > After some digging and adding some more logging in code i have notice that > following condition in {code}AsyncRpcClient.removeConnection(AsyncRpcChannel > connection) {code} is never true: > {code} > if (connectionInPool == connection) { > {code} > causing that {code}AsyncRpcChannel{code} connection is never removed from > {code}connections{code} pool in case rs fails. > After changing this condition to: > {code} > if (connectionInPool.address.equals(connection.address)) { > {code} > issue was resolved and client was removing failed server from connections > pool. > I will attach patch after running some more tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)