[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039665#comment-15039665 ] Yu Li commented on HBASE-14906: --- Thanks for the review and comments Duo! > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, > HBASE-14906.v3.patch, HBASE-14906.v4.patch, HBASE-14906.v4.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039668#comment-15039668 ] Ted Malaska commented on HBASE-14795: - Can we open up a review board for this. Thx > Enhance the spark-hbase scan operations > --- > > Key: HBASE-14795 > URL: https://issues.apache.org/jira/browse/HBASE-14795 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > Attachments: > 0001-HBASE-14795-Enhance-the-spark-hbase-scan-operations.patch > > > This is a sub-jira of HBASE-14789. This jira is to focus on the replacement > of TableInputFormat for a more custom scan implementation that will make the > following use case more effective. > Use case: > In the case you have multiple scan ranges on a single table with in a single > query. TableInputFormat will scan the the outer range of the scan start and > end range where this implementation can be more pointed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14926) Hung ThriftServer; no timeout on read from client; if client crashes, worker thread gets stuck reading
[ https://issues.apache.org/jira/browse/HBASE-14926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14926: -- Attachment: 14926.patch Patch for thrift1 and thrift2 (of course the server implementations are different). Timeout seems to only work for TBoundedThreadPoolServer. Fixed up the examples doc too for thrift (It baffled me for a while) I'm a bit stuck on how to manufacture this circumstance in a test; i'd have to kill the client exactly where the server is doing a read... any ideas?. > Hung ThriftServer; no timeout on read from client; if client crashes, worker > thread gets stuck reading > -- > > Key: HBASE-14926 > URL: https://issues.apache.org/jira/browse/HBASE-14926 > Project: HBase > Issue Type: Bug > Components: Thrift >Affects Versions: 2.0.0, 1.2.0, 1.1.2, 1.3.0, 1.0.3, 0.98.16 >Reporter: stack > Attachments: 14926.patch > > > Thrift server is hung. All worker threads are doing this: > {code} > "thrift-worker-0" daemon prio=10 tid=0x7f0bb95c2800 nid=0xf6a7 runnable > [0x7f0b956e] >java.lang.Thread.State: RUNNABLE > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:152) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > - locked <0x00066d859490> (a java.io.BufferedInputStream) > at > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at > org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) > at > org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at > org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601) > at > org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27) > at > org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:289) > at > org.apache.hadoop.hbase.thrift.CallQueue$Call.run(CallQueue.java:64) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > They never recover. > I don't have client side logs. > We've been here before: HBASE-4967 "connected client thrift sockets should > have a server side read timeout" but this patch only got applied to fb branch > (and thrift has changed since then). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14926) Hung ThriftServer; no timeout on read from client; if client crashes, worker thread gets stuck reading
[ https://issues.apache.org/jira/browse/HBASE-14926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14926: -- Assignee: stack Affects Version/s: 1.3.0 1.2.0 2.0.0 1.1.2 1.0.3 0.98.16 Status: Patch Available (was: Open) > Hung ThriftServer; no timeout on read from client; if client crashes, worker > thread gets stuck reading > -- > > Key: HBASE-14926 > URL: https://issues.apache.org/jira/browse/HBASE-14926 > Project: HBase > Issue Type: Bug > Components: Thrift >Affects Versions: 0.98.16, 1.0.3, 1.1.2, 2.0.0, 1.2.0, 1.3.0 >Reporter: stack >Assignee: stack > Attachments: 14926.patch > > > Thrift server is hung. All worker threads are doing this: > {code} > "thrift-worker-0" daemon prio=10 tid=0x7f0bb95c2800 nid=0xf6a7 runnable > [0x7f0b956e] >java.lang.Thread.State: RUNNABLE > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:152) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > - locked <0x00066d859490> (a java.io.BufferedInputStream) > at > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at > org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) > at > org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at > org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601) > at > org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27) > at > org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:289) > at > org.apache.hadoop.hbase.thrift.CallQueue$Call.run(CallQueue.java:64) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > They never recover. > I don't have client side logs. > We've been here before: HBASE-4967 "connected client thrift sockets should > have a server side read timeout" but this patch only got applied to fb branch > (and thrift has changed since then). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14904) Mark Base[En|De]coder LimitedPrivate and fix binary compat issue
[ https://issues.apache.org/jira/browse/HBASE-14904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15041146#comment-15041146 ] Hudson commented on HBASE-14904: SUCCESS: Integrated in HBase-1.3 #413 (See [https://builds.apache.org/job/HBase-1.3/413/]) HBASE-14904 Mark Base[En|De]coder LimitedPrivate and fix binary compat (enis: rev edb8edfeb3564152dfacac0e5fe71ba295df821e) * hbase-common/src/main/java/org/apache/hadoop/hbase/codec/BaseDecoder.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java * hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALPrettyPrinter.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java * hbase-common/src/main/java/org/apache/hadoop/hbase/codec/BaseEncoder.java > Mark Base[En|De]coder LimitedPrivate and fix binary compat issue > > > Key: HBASE-14904 > URL: https://issues.apache.org/jira/browse/HBASE-14904 > Project: HBase > Issue Type: Bug >Reporter: Enis Soztutar >Assignee: Enis Soztutar > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: hbase-14904_v1.patch, hbase-14904_v2.patch > > > PHOENIX-2477 revealed that the changes from HBASE-14501 breaks binary > compatibility in Phoenix compiled with earlier versions of HBase and run > agains later versions. > This is one of the areas that the boundary is not clear, but it won't hurt us > to fix it. > The exception trace is: > {code} > Exception in thread "main" java.lang.NoSuchFieldError: in > at > org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec$PhoenixBaseDecoder.(IndexedWALEditCodec.java:106) > at > org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec$IndexKeyValueDecoder.(IndexedWALEditCodec.java:121) > at > org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec.getDecoder(IndexedWALEditCodec.java:63) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initAfterCompression(ProtobufLogReader.java:292) > at > org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:82) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:148) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:316) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:281) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:269) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:418) > at > org.apache.hadoop.hbase.wal.WALPrettyPrinter.processFile(WALPrettyPrinter.java:247) > at > org.apache.hadoop.hbase.wal.WALPrettyPrinter.run(WALPrettyPrinter.java:422) > at > org.apache.hadoop.hbase.wal.WALPrettyPrinter.main(WALPrettyPrinter.java:357) > {code} > Although {{BaseDecoder.in}} is still there, it got changed to be a class > rather than an interface. BaseDecoder is marked Private, thus the binary > compat check is not run at all. Not sure whether it would have caught this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only
[ https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15041145#comment-15041145 ] Heng Chen commented on HBASE-14790: --- Make sense... Let's just keep here as original. We can only realize 'acked length' logic, it could fix HBASE-14004 already. As performance improvement work, keep going yours fanout Stream here. Thoughts? [~Apache9] > Implement a new DFSOutputStream for logging WAL only > > > Key: HBASE-14790 > URL: https://issues.apache.org/jira/browse/HBASE-14790 > Project: HBase > Issue Type: Improvement >Reporter: Duo Zhang > > The original {{DFSOutputStream}} is very powerful and aims to serve all > purposes. But in fact, we do not need most of the features if we only want to > log WAL. For example, we do not need pipeline recovery since we could just > close the old logger and open a new one. And also, we do not need to write > multiple blocks since we could also open a new logger if the old file is too > large. > And the most important thing is that, it is hard to handle all the corner > cases to avoid data loss or data inconsistency(such as HBASE-14004) when > using original DFSOutputStream due to its complicated logic. And the > complicated logic also force us to use some magical tricks to increase > performance. For example, we need to use multiple threads to call {{hflush}} > when logging, and now we use 5 threads. But why 5 not 10 or 100? > So here, I propose we should implement our own {{DFSOutputStream}} when > logging WAL. For correctness, and also for performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14822) Renewing leases of scanners doesn't work
[ https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-14822: -- Attachment: 14822-v4-0.98.txt 0.98 version that simply adds a new PB flag. Master version soon. > Renewing leases of scanners doesn't work > > > Key: HBASE-14822 > URL: https://issues.apache.org/jira/browse/HBASE-14822 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.14 >Reporter: Samarth Jain >Assignee: Lars Hofhansl > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: 14822-0.98-v2.txt, 14822-0.98-v3.txt, 14822-0.98.txt, > 14822-v3-0.98.txt, 14822-v4-0.98.txt, 14822.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14904) Mark Base[En|De]coder LimitedPrivate and fix binary compat issue
[ https://issues.apache.org/jira/browse/HBASE-14904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15041228#comment-15041228 ] Hudson commented on HBASE-14904: FAILURE: Integrated in HBase-Trunk_matrix #530 (See [https://builds.apache.org/job/HBase-Trunk_matrix/530/]) HBASE-14904 Mark Base[En|De]coder LimitedPrivate and fix binary compat (enis: rev b3260423b1f59a0af80f5938339997569c3eb21a) * hbase-common/src/main/java/org/apache/hadoop/hbase/codec/BaseEncoder.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java * hbase-common/src/main/java/org/apache/hadoop/hbase/codec/BaseDecoder.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java * hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALPrettyPrinter.java > Mark Base[En|De]coder LimitedPrivate and fix binary compat issue > > > Key: HBASE-14904 > URL: https://issues.apache.org/jira/browse/HBASE-14904 > Project: HBase > Issue Type: Bug >Reporter: Enis Soztutar >Assignee: Enis Soztutar > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: hbase-14904_v1.patch, hbase-14904_v2.patch > > > PHOENIX-2477 revealed that the changes from HBASE-14501 breaks binary > compatibility in Phoenix compiled with earlier versions of HBase and run > agains later versions. > This is one of the areas that the boundary is not clear, but it won't hurt us > to fix it. > The exception trace is: > {code} > Exception in thread "main" java.lang.NoSuchFieldError: in > at > org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec$PhoenixBaseDecoder.(IndexedWALEditCodec.java:106) > at > org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec$IndexKeyValueDecoder.(IndexedWALEditCodec.java:121) > at > org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec.getDecoder(IndexedWALEditCodec.java:63) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initAfterCompression(ProtobufLogReader.java:292) > at > org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:82) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:148) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:316) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:281) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:269) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:418) > at > org.apache.hadoop.hbase.wal.WALPrettyPrinter.processFile(WALPrettyPrinter.java:247) > at > org.apache.hadoop.hbase.wal.WALPrettyPrinter.run(WALPrettyPrinter.java:422) > at > org.apache.hadoop.hbase.wal.WALPrettyPrinter.main(WALPrettyPrinter.java:357) > {code} > Although {{BaseDecoder.in}} is still there, it got changed to be a class > rather than an interface. BaseDecoder is marked Private, thus the binary > compat check is not run at all. Not sure whether it would have caught this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15041227#comment-15041227 ] Hudson commented on HBASE-13082: FAILURE: Integrated in HBase-Trunk_matrix #530 (See [https://builds.apache.org/job/HBase-Trunk_matrix/530/]) HBASE-13082 Coarsen StoreScanner locks to RegionScanner (Ram) (ramkrishna: rev 8b3d1f144408e4a7a014c5ac46418c9e91b9b0db) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionMergeTransactionOnCluster.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionReplayEvents.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/cleaner/TestSnapshotFromMaster.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java * hbase-server/src/test/java/org/apache/hadoop/hbase/backup/example/TestZooKeeperTableArchiveClient.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StripeStoreFileManager.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestEncryptionKeyRotation.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/MockStoreFile.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStripeStoreFileManager.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/TestCompactedHFilesDischarger.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactedHFilesDischarger.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionReplicas.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileManager.java * hbase-server/src/test/java/org/apache/hadoop/hbase/io/TestHeapSize.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ReversedStoreScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultStoreFileManager.java * hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat2.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java * hbase-server/src/test/java/org/apache/hadoop/hbase/TestIOFencing.java > Coarsen StoreScanner locks to RegionScanner > --- > > Key: HBASE-13082 > URL: https://issues.apache.org/jira/browse/HBASE-13082 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: 13082-test.txt, 13082-v2.txt, 13082-v3.txt, > 13082-v4.txt, 13082.txt, 13082.txt, HBASE-13082.pdf, HBASE-13082_1.pdf, > HBASE-13082_12.patch, HBASE-13082_13.patch, HBASE-13082_14.patch, > HBASE-13082_15.patch, HBASE-13082_16.patch, HBASE-13082_17.patch, > HBASE-13082_18.patch, HBASE-13082_19.patch, HBASE-13082_1_WIP.patch, > HBASE-13082_2.pdf, HBASE-13082_2_WIP.patch, HBASE-13082_3.patch, > HBASE-13082_4.patch, HBASE-13082_9.patch, HBASE-13082_9.patch, > HBASE-13082_withoutpatch.jpg, HBASE-13082_withpatch.jpg, > LockVsSynchronized.java, gc.png, gc.png, gc.png, hits.png, next.png, next.png > > > Continuing where HBASE-10015 left of. > We can avoid locking (and memory fencing) inside StoreScanner by deferring to > the lock already held by the RegionScanner. > In tests this shows quite a scan improvement and reduced CPU (the fences make > the cores wait for memory fetches). > There are some drawbacks too: > * All calls to RegionScanner need to be remain synchronized > * Implementors of coprocessors need to be diligent in following the locking > contract. For example Phoenix does not lock RegionScanner.nextRaw() and > required in the documentation (not picking on Phoenix, this one is my fault > as I told them it's OK) > * possible starving of flushes and compaction with heavy read load. > RegionScanner operations would keep getting the locks and the > flushes/compactions would not be able finalize the set of files. > I'll have a patch soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14904) Mark Base[En|De]coder LimitedPrivate and fix binary compat issue
[ https://issues.apache.org/jira/browse/HBASE-14904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-14904: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Pushed this to 0.98+. Thanks for looking. > Mark Base[En|De]coder LimitedPrivate and fix binary compat issue > > > Key: HBASE-14904 > URL: https://issues.apache.org/jira/browse/HBASE-14904 > Project: HBase > Issue Type: Bug >Reporter: Enis Soztutar >Assignee: Enis Soztutar > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: hbase-14904_v1.patch, hbase-14904_v2.patch > > > PHOENIX-2477 revealed that the changes from HBASE-14501 breaks binary > compatibility in Phoenix compiled with earlier versions of HBase and run > agains later versions. > This is one of the areas that the boundary is not clear, but it won't hurt us > to fix it. > The exception trace is: > {code} > Exception in thread "main" java.lang.NoSuchFieldError: in > at > org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec$PhoenixBaseDecoder.(IndexedWALEditCodec.java:106) > at > org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec$IndexKeyValueDecoder.(IndexedWALEditCodec.java:121) > at > org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec.getDecoder(IndexedWALEditCodec.java:63) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initAfterCompression(ProtobufLogReader.java:292) > at > org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:82) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:148) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:316) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:281) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:269) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:418) > at > org.apache.hadoop.hbase.wal.WALPrettyPrinter.processFile(WALPrettyPrinter.java:247) > at > org.apache.hadoop.hbase.wal.WALPrettyPrinter.run(WALPrettyPrinter.java:422) > at > org.apache.hadoop.hbase.wal.WALPrettyPrinter.main(WALPrettyPrinter.java:357) > {code} > Although {{BaseDecoder.in}} is still there, it got changed to be a class > rather than an interface. BaseDecoder is marked Private, thus the binary > compat check is not run at all. Not sure whether it would have caught this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14869) Better request latency histograms
[ https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15040776#comment-15040776 ] Vikas Vishwakarma commented on HBASE-14869: --- For the metrics I am using _SizeRangeCount_ and _TimeRangeCount_ appended to each metric so it is easy to identify Range metrics based on these fixed patterns that will differentiate it from all other metrics. Also based on /Size/ and /Time/ match it will be easy to process the metrics accordingly as time or size metric > Better request latency histograms > - > > Key: HBASE-14869 > URL: https://issues.apache.org/jira/browse/HBASE-14869 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Assignee: Vikas Vishwakarma > Fix For: 2.0.0, 1.3.0, 0.98.17 > > Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, > 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, > 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png > > > I just discussed this with a colleague. > The get, put, etc, histograms that each region server keeps are somewhat > useless (depending on what you want to achieve of course), as they are > aggregated and calculated by each region server. > It would be better to record the number of requests in certainly latency > bands in addition to what we do now. > For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, > 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be > configurable). > That way we can do further calculations after the fact, and answer questions > like: How often did we miss our SLA? Percentage of requests that missed an > SLA, etc. > Comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only
[ https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15041196#comment-15041196 ] Heng Chen commented on HBASE-14790: --- {quote} DataStreamer#block tracks the "number of bytes acked". It is returned by DFSOutputStream#getBlock {quote} Bad news... This method is not public [~zhz] > Implement a new DFSOutputStream for logging WAL only > > > Key: HBASE-14790 > URL: https://issues.apache.org/jira/browse/HBASE-14790 > Project: HBase > Issue Type: Improvement >Reporter: Duo Zhang > > The original {{DFSOutputStream}} is very powerful and aims to serve all > purposes. But in fact, we do not need most of the features if we only want to > log WAL. For example, we do not need pipeline recovery since we could just > close the old logger and open a new one. And also, we do not need to write > multiple blocks since we could also open a new logger if the old file is too > large. > And the most important thing is that, it is hard to handle all the corner > cases to avoid data loss or data inconsistency(such as HBASE-14004) when > using original DFSOutputStream due to its complicated logic. And the > complicated logic also force us to use some magical tricks to increase > performance. For example, we need to use multiple threads to call {{hflush}} > when logging, and now we use 5 threads. But why 5 not 10 or 100? > So here, I propose we should implement our own {{DFSOutputStream}} when > logging WAL. For correctness, and also for performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14926) Hung ThriftServer; no timeout on read from client; if client crashes, worker thread gets stuck reading
stack created HBASE-14926: - Summary: Hung ThriftServer; no timeout on read from client; if client crashes, worker thread gets stuck reading Key: HBASE-14926 URL: https://issues.apache.org/jira/browse/HBASE-14926 Project: HBase Issue Type: Bug Components: Thrift Reporter: stack Thrift server is hung. All worker threads are doing this: {code} "thrift-worker-0" daemon prio=10 tid=0x7f0bb95c2800 nid=0xf6a7 runnable [0x7f0b956e] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked <0x00066d859490> (a java.io.BufferedInputStream) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601) at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27) at org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:289) at org.apache.hadoop.hbase.thrift.CallQueue$Call.run(CallQueue.java:64) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} They never recover. I don't have client side logs. We've been here before: HBASE-4967 "connected client thrift sockets should have a server side read timeout" but this patch only got applied to fb branch (and thrift has changed since then). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13857) Slow WAL Append count in ServerMetricsTmpl.jamon is hardcoded to zero
[ https://issues.apache.org/jira/browse/HBASE-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039698#comment-15039698 ] Hudson commented on HBASE-13857: FAILURE: Integrated in HBase-Trunk_matrix #529 (See [https://builds.apache.org/job/HBase-Trunk_matrix/529/]) HBASE-13857 Slow WAL Append count in ServerMetricsTmpl.jamon is (stack: rev 51503efcf05be734c14200233d5f1495e4c2c3f1) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerWrapperImpl.java * hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/ServerMetricsTmpl.jamon * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestMetricsWAL.java * hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerWrapper.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/wal/MetricsWALSourceImpl.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerWrapperStub.java * hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/wal/MetricsWALSource.java > Slow WAL Append count in ServerMetricsTmpl.jamon is hardcoded to zero > - > > Key: HBASE-13857 > URL: https://issues.apache.org/jira/browse/HBASE-13857 > Project: HBase > Issue Type: Bug > Components: regionserver, UI >Affects Versions: 0.98.0 >Reporter: Lars George >Assignee: Vrishal Kulkarni > Labels: beginner > Fix For: 2.0.0 > > Attachments: HBASE-13857.patch > > > The template has this: > {noformat} > > ... > Slow WAL Append Count > > > > <% 0 %> > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only
[ https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039718#comment-15039718 ] Duo Zhang commented on HBASE-14790: --- {quote} 2. dn1 received the WAL entry, and it is read by ReplicationSource and replicated to slave cluster. 3. dn1 and rs both crash, dn2 and dn3 has not received this WAL entry yet, and rs has not bumped the GS of this block yet. 4. NameNode complete the file with a length that does not contains this WAL entry since the GS of blocks on dn2 and dn3 is correct and NameNode does not know there used to be a block with longer length. {quote} In a fan out implementation, this problem is obvious but in a pipelined implementation it is not that straight-forward and I used to think I was wrong and this could not happen in a pipelined implementation. The data can only be visible on datanode only after it receives the downstream ack. So if the pipeline is dn1->dn2->dn3, then dn3 is the first datanode that make a data visible to client and usually we think the data should also be written to dn1 and dn2. But maybe for performance reason, {{BlockReceiver}} sends a packet to downstream mirror before writing it to local disk. So it could happen that dn3 make the data visible and read by client, but dn1 and dn2 crash before writing data to local disk. Then let us kill the client and dn3, and restart dn1 and dn2, whoops... And I had a discussion with my workmate [~yangzhe1991], we think that if we allow duplicate WAL entries in HBase, then the pipeline recovery part could also be moved to a background thread. We could just rewrite the WAL entries after acked point to the new file, this could also reduce the recovery latency. And for keeping an "acked length", I think we could make use of the fsync method in HDFS. We could call fsync asynchronously to update length on namenode. The replication source should not read beyond the length gotten from namenode(do not trust the visible length read from datanode). The advantage here is when region server crashes, we could still get this value from namenode, and the file will be closed eventually by someone so the length will finally be correct. Thanks. > Implement a new DFSOutputStream for logging WAL only > > > Key: HBASE-14790 > URL: https://issues.apache.org/jira/browse/HBASE-14790 > Project: HBase > Issue Type: Improvement >Reporter: Duo Zhang > > The original {{DFSOutputStream}} is very powerful and aims to serve all > purposes. But in fact, we do not need most of the features if we only want to > log WAL. For example, we do not need pipeline recovery since we could just > close the old logger and open a new one. And also, we do not need to write > multiple blocks since we could also open a new logger if the old file is too > large. > And the most important thing is that, it is hard to handle all the corner > cases to avoid data loss or data inconsistency(such as HBASE-14004) when > using original DFSOutputStream due to its complicated logic. And the > complicated logic also force us to use some magical tricks to increase > performance. For example, we need to use multiple threads to call {{hflush}} > when logging, and now we use 5 threads. But why 5 not 10 or 100? > So here, I propose we should implement our own {{DFSOutputStream}} when > logging WAL. For correctness, and also for performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14869) Better request latency histograms
[ https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039719#comment-15039719 ] Andrew Purtell commented on HBASE-14869: Thanks Vikas. It would be a shame if we would wish to tweak the naming after this is committed, that's all. Not worried about more than that. > Better request latency histograms > - > > Key: HBASE-14869 > URL: https://issues.apache.org/jira/browse/HBASE-14869 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Assignee: Vikas Vishwakarma > Fix For: 2.0.0, 1.3.0, 0.98.17 > > Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, > 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, > 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png > > > I just discussed this with a colleague. > The get, put, etc, histograms that each region server keeps are somewhat > useless (depending on what you want to achieve of course), as they are > aggregated and calculated by each region server. > It would be better to record the number of requests in certainly latency > bands in addition to what we do now. > For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, > 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be > configurable). > That way we can do further calculations after the fact, and answer questions > like: How often did we miss our SLA? Percentage of requests that missed an > SLA, etc. > Comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14869) Better request latency histograms
[ https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039738#comment-15039738 ] Lars Hofhansl commented on HBASE-14869: --- Cool. Metric name's the only open issue. If nobody else chimes in, I'm good committing. Maybe [~vik.karma] can report how hard it was to make sense of these new metric in the automated scripts. In the end any naming is probably fine. The main part I wasn't sure about was the "greater than X" naming. Recall this scheme: "Get_0-1", "Get_1-3", "Get_10-30" , etc, and "Get_>60" > Better request latency histograms > - > > Key: HBASE-14869 > URL: https://issues.apache.org/jira/browse/HBASE-14869 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Assignee: Vikas Vishwakarma > Fix For: 2.0.0, 1.3.0, 0.98.17 > > Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, > 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, > 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png > > > I just discussed this with a colleague. > The get, put, etc, histograms that each region server keeps are somewhat > useless (depending on what you want to achieve of course), as they are > aggregated and calculated by each region server. > It would be better to record the number of requests in certainly latency > bands in addition to what we do now. > For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, > 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be > configurable). > That way we can do further calculations after the fact, and answer questions > like: How often did we miss our SLA? Percentage of requests that missed an > SLA, etc. > Comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14904) Mark Base[En|De]coder LimitedPrivate and fix binary compat issue
[ https://issues.apache.org/jira/browse/HBASE-14904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15040796#comment-15040796 ] Hudson commented on HBASE-14904: SUCCESS: Integrated in HBase-1.3-IT #354 (See [https://builds.apache.org/job/HBase-1.3-IT/354/]) HBASE-14904 Mark Base[En|De]coder LimitedPrivate and fix binary compat (enis: rev edb8edfeb3564152dfacac0e5fe71ba295df821e) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java * hbase-common/src/main/java/org/apache/hadoop/hbase/codec/BaseDecoder.java * hbase-common/src/main/java/org/apache/hadoop/hbase/codec/BaseEncoder.java * hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALPrettyPrinter.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java > Mark Base[En|De]coder LimitedPrivate and fix binary compat issue > > > Key: HBASE-14904 > URL: https://issues.apache.org/jira/browse/HBASE-14904 > Project: HBase > Issue Type: Bug >Reporter: Enis Soztutar >Assignee: Enis Soztutar > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: hbase-14904_v1.patch, hbase-14904_v2.patch > > > PHOENIX-2477 revealed that the changes from HBASE-14501 breaks binary > compatibility in Phoenix compiled with earlier versions of HBase and run > agains later versions. > This is one of the areas that the boundary is not clear, but it won't hurt us > to fix it. > The exception trace is: > {code} > Exception in thread "main" java.lang.NoSuchFieldError: in > at > org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec$PhoenixBaseDecoder.(IndexedWALEditCodec.java:106) > at > org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec$IndexKeyValueDecoder.(IndexedWALEditCodec.java:121) > at > org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec.getDecoder(IndexedWALEditCodec.java:63) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initAfterCompression(ProtobufLogReader.java:292) > at > org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:82) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:148) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:316) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:281) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:269) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:418) > at > org.apache.hadoop.hbase.wal.WALPrettyPrinter.processFile(WALPrettyPrinter.java:247) > at > org.apache.hadoop.hbase.wal.WALPrettyPrinter.run(WALPrettyPrinter.java:422) > at > org.apache.hadoop.hbase.wal.WALPrettyPrinter.main(WALPrettyPrinter.java:357) > {code} > Although {{BaseDecoder.in}} is still there, it got changed to be a class > rather than an interface. BaseDecoder is marked Private, thus the binary > compat check is not run at all. Not sure whether it would have caught this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-13082: --- Fix Version/s: 2.0.0 > Coarsen StoreScanner locks to RegionScanner > --- > > Key: HBASE-13082 > URL: https://issues.apache.org/jira/browse/HBASE-13082 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: 13082-test.txt, 13082-v2.txt, 13082-v3.txt, > 13082-v4.txt, 13082.txt, 13082.txt, HBASE-13082.pdf, HBASE-13082_1.pdf, > HBASE-13082_12.patch, HBASE-13082_13.patch, HBASE-13082_14.patch, > HBASE-13082_15.patch, HBASE-13082_16.patch, HBASE-13082_17.patch, > HBASE-13082_18.patch, HBASE-13082_19.patch, HBASE-13082_1_WIP.patch, > HBASE-13082_2.pdf, HBASE-13082_2_WIP.patch, HBASE-13082_3.patch, > HBASE-13082_4.patch, HBASE-13082_9.patch, HBASE-13082_9.patch, > HBASE-13082_withoutpatch.jpg, HBASE-13082_withpatch.jpg, > LockVsSynchronized.java, gc.png, gc.png, gc.png, hits.png, next.png, next.png > > > Continuing where HBASE-10015 left of. > We can avoid locking (and memory fencing) inside StoreScanner by deferring to > the lock already held by the RegionScanner. > In tests this shows quite a scan improvement and reduced CPU (the fences make > the cores wait for memory fetches). > There are some drawbacks too: > * All calls to RegionScanner need to be remain synchronized > * Implementors of coprocessors need to be diligent in following the locking > contract. For example Phoenix does not lock RegionScanner.nextRaw() and > required in the documentation (not picking on Phoenix, this one is my fault > as I told them it's OK) > * possible starving of flushes and compaction with heavy read load. > RegionScanner operations would keep getting the locks and the > flushes/compactions would not be able finalize the set of files. > I'll have a patch soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-13082: --- Release Note: After this JIRA we will not be doing any scanner reset after compaction during a course of a scan. The files that were compacted will still be continued to be used in the scan process. The compacted files will be archived by a background thread that runs every 2 mins by default only when there are no active scanners on those comapcted files. The above duration can be controlled using the knob 'hbase.hfile.compactions.cleaner.interval'. > Coarsen StoreScanner locks to RegionScanner > --- > > Key: HBASE-13082 > URL: https://issues.apache.org/jira/browse/HBASE-13082 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: 13082-test.txt, 13082-v2.txt, 13082-v3.txt, > 13082-v4.txt, 13082.txt, 13082.txt, HBASE-13082.pdf, HBASE-13082_1.pdf, > HBASE-13082_12.patch, HBASE-13082_13.patch, HBASE-13082_14.patch, > HBASE-13082_15.patch, HBASE-13082_16.patch, HBASE-13082_17.patch, > HBASE-13082_18.patch, HBASE-13082_19.patch, HBASE-13082_1_WIP.patch, > HBASE-13082_2.pdf, HBASE-13082_2_WIP.patch, HBASE-13082_3.patch, > HBASE-13082_4.patch, HBASE-13082_9.patch, HBASE-13082_9.patch, > HBASE-13082_withoutpatch.jpg, HBASE-13082_withpatch.jpg, > LockVsSynchronized.java, gc.png, gc.png, gc.png, hits.png, next.png, next.png > > > Continuing where HBASE-10015 left of. > We can avoid locking (and memory fencing) inside StoreScanner by deferring to > the lock already held by the RegionScanner. > In tests this shows quite a scan improvement and reduced CPU (the fences make > the cores wait for memory fetches). > There are some drawbacks too: > * All calls to RegionScanner need to be remain synchronized > * Implementors of coprocessors need to be diligent in following the locking > contract. For example Phoenix does not lock RegionScanner.nextRaw() and > required in the documentation (not picking on Phoenix, this one is my fault > as I told them it's OK) > * possible starving of flushes and compaction with heavy read load. > RegionScanner operations would keep getting the locks and the > flushes/compactions would not be able finalize the set of files. > I'll have a patch soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only
[ https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15041185#comment-15041185 ] stack commented on HBASE-14790: --- bq. ReplicationSource should ask this length first before reading and do not read beyond it. If we have this logic, Doing this would be an improvement over current way we do replication -- less NN ops -- where we open the file, read till EOF, close, then do same again to see if anything new has been added to the file. bq. ...we could reset the acked length if needed and then move the remaining operations of closing file to a background thread to reduce latency. Thoughts? stack This is clean up of a broken WAL? This is being able to ask each DN what it thinks the length is? While this is going on, we would be holding on to the hbase handlers not letting response go back to the client? Would we have to do some weird accounting where three clients A, B, and C and each written an edit, and then the length we get back from exisiting DNs after a crash say does not include the edit written by client C... we'll have to figure out how to fail client C's write (though we'd moved on from append and were trying to sync/hflush the append)? bq. We could just rewrite the WAL entries after acked point to the new file, this could also reduce the recovery latency. I think we can do this currently in the multi WAL case... would have to check (or at least one implementation that may not be the one that landed, used to do this). It would keep around the edits because it would have a standby WAL and if the current WAL was 'slow', we'd throw it away and then add the outstanding edits to the new WAL and away we go again (I can dig it up... ) bq. The replication source should not read beyond the length gotten from namenode(do not trust the visible length read from datanode). This would be lots of NN ops? (In a subsequent comment you say this... nvm) bq. The advantage here is when region server crashes, we could still get this value from namenode, and the file will be closed eventually by someone so the length will finally be correct. This would be sweet though (could do away with keeping replication lengths up in zk?) bq. There will always be some situation that we could not know there is data loss unless we call fsync every time to update length on namenode when writing WAL I think. Yes. This is the case before your patch though. We should also get some experience of what its like trying fsync.'d WAL... > Implement a new DFSOutputStream for logging WAL only > > > Key: HBASE-14790 > URL: https://issues.apache.org/jira/browse/HBASE-14790 > Project: HBase > Issue Type: Improvement >Reporter: Duo Zhang > > The original {{DFSOutputStream}} is very powerful and aims to serve all > purposes. But in fact, we do not need most of the features if we only want to > log WAL. For example, we do not need pipeline recovery since we could just > close the old logger and open a new one. And also, we do not need to write > multiple blocks since we could also open a new logger if the old file is too > large. > And the most important thing is that, it is hard to handle all the corner > cases to avoid data loss or data inconsistency(such as HBASE-14004) when > using original DFSOutputStream due to its complicated logic. And the > complicated logic also force us to use some magical tricks to increase > performance. For example, we need to use multiple threads to call {{hflush}} > when logging, and now we use 5 threads. But why 5 not 10 or 100? > So here, I propose we should implement our own {{DFSOutputStream}} when > logging WAL. For correctness, and also for performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14905) VerifyReplication does not honour versions option
[ https://issues.apache.org/jira/browse/HBASE-14905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039688#comment-15039688 ] Hudson commented on HBASE-14905: FAILURE: Integrated in HBase-1.3-IT #353 (See [https://builds.apache.org/job/HBase-1.3-IT/353/]) HBASE-14905 VerifyReplication does not honour versions option (Vishal (tedyu: rev b001019d9bca43586de13bd7df72235d56d36503) * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java * hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSmallTests.java * hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationBase.java > VerifyReplication does not honour versions option > - > > Key: HBASE-14905 > URL: https://issues.apache.org/jira/browse/HBASE-14905 > Project: HBase > Issue Type: Bug > Components: tooling >Affects Versions: 2.0.0, 0.98.16 >Reporter: Vishal Khandelwal >Assignee: Vishal Khandelwal > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14905-v2.txt, HBASE-14905.patch, HBASE-14905_v3.patch, > HBASE-14905_v4.patch, test.patch > > > source: > hbase(main):001:0> scan 't1', {RAW => true, VERSIONS => 100} > ROW COLUMN+CELL > > > r1 column=f1:, timestamp=1449030102091, > value=value1112 > > r1 column=f1:, timestamp=1449029774173, > value=value1001 > > r1 column=f1:, timestamp=1449029709974, > value=value1002 > > target: > hbase(main):023:0> scan 't1', {RAW => true, VERSIONS => 100} > ROW COLUMN+CELL > > > r1 column=f1:, timestamp=1449030102091, > value=value1112 > > r1 column=f1:, timestamp=1449030090758, > value=value1112 > > r1 column=f1:, timestamp=1449029984282, > value=value > > r1 column=f1:, timestamp=1449029774173, > value=value1001 > > r1 column=f1:, timestamp=1449029709974, > value=value1002 > /bin/hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication > --versions=100 1 t1 > org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier$Counters > GOODROWS=1 > Does not show any mismatch. Ideally it should show. This is because in > VerifyReplication Class maxVersion is not correctly set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-13082: --- Resolution: Fixed Status: Resolved (was: Patch Available) Pushed to master. Thanks for the reviews [~stack] and [~anoop.hbase] and others for providing feedback on the patch. > Coarsen StoreScanner locks to RegionScanner > --- > > Key: HBASE-13082 > URL: https://issues.apache.org/jira/browse/HBASE-13082 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: ramkrishna.s.vasudevan > Attachments: 13082-test.txt, 13082-v2.txt, 13082-v3.txt, > 13082-v4.txt, 13082.txt, 13082.txt, HBASE-13082.pdf, HBASE-13082_1.pdf, > HBASE-13082_12.patch, HBASE-13082_13.patch, HBASE-13082_14.patch, > HBASE-13082_15.patch, HBASE-13082_16.patch, HBASE-13082_17.patch, > HBASE-13082_18.patch, HBASE-13082_19.patch, HBASE-13082_1_WIP.patch, > HBASE-13082_2.pdf, HBASE-13082_2_WIP.patch, HBASE-13082_3.patch, > HBASE-13082_4.patch, HBASE-13082_9.patch, HBASE-13082_9.patch, > HBASE-13082_withoutpatch.jpg, HBASE-13082_withpatch.jpg, > LockVsSynchronized.java, gc.png, gc.png, gc.png, hits.png, next.png, next.png > > > Continuing where HBASE-10015 left of. > We can avoid locking (and memory fencing) inside StoreScanner by deferring to > the lock already held by the RegionScanner. > In tests this shows quite a scan improvement and reduced CPU (the fences make > the cores wait for memory fetches). > There are some drawbacks too: > * All calls to RegionScanner need to be remain synchronized > * Implementors of coprocessors need to be diligent in following the locking > contract. For example Phoenix does not lock RegionScanner.nextRaw() and > required in the documentation (not picking on Phoenix, this one is my fault > as I told them it's OK) > * possible starving of flushes and compaction with heavy read load. > RegionScanner operations would keep getting the locks and the > flushes/compactions would not be able finalize the set of files. > I'll have a patch soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only
[ https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15040781#comment-15040781 ] Heng Chen commented on HBASE-14790: --- {quote} And for keeping an "acked length", I think we could make use of the fsync method in HDFS. We could call fsync asynchronously to update length on namenode. The replication source should not read beyond the length gotten from namenode(do not trust the visible length read from datanode). {quote} So if we can not avoid fsync every time, maybe this way [~Apache9] mentioned is the best solution? Shall we begin? > Implement a new DFSOutputStream for logging WAL only > > > Key: HBASE-14790 > URL: https://issues.apache.org/jira/browse/HBASE-14790 > Project: HBase > Issue Type: Improvement >Reporter: Duo Zhang > > The original {{DFSOutputStream}} is very powerful and aims to serve all > purposes. But in fact, we do not need most of the features if we only want to > log WAL. For example, we do not need pipeline recovery since we could just > close the old logger and open a new one. And also, we do not need to write > multiple blocks since we could also open a new logger if the old file is too > large. > And the most important thing is that, it is hard to handle all the corner > cases to avoid data loss or data inconsistency(such as HBASE-14004) when > using original DFSOutputStream due to its complicated logic. And the > complicated logic also force us to use some magical tricks to increase > performance. For example, we need to use multiple threads to call {{hflush}} > when logging, and now we use 5 threads. But why 5 not 10 or 100? > So here, I propose we should implement our own {{DFSOutputStream}} when > logging WAL. For correctness, and also for performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14822) Renewing leases of scanners doesn't work
[ https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15041167#comment-15041167 ] Hadoop QA commented on HBASE-14822: --- {color:red}-1 overall{color}. Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16763//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16763//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16763//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16763//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16763//console This message is automatically generated. > Renewing leases of scanners doesn't work > > > Key: HBASE-14822 > URL: https://issues.apache.org/jira/browse/HBASE-14822 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.14 >Reporter: Samarth Jain >Assignee: Lars Hofhansl > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: 14822-0.98-v2.txt, 14822-0.98-v3.txt, 14822-0.98.txt, > 14822-v3-0.98.txt, 14822-v4-0.98.txt, 14822.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only
[ https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15041234#comment-15041234 ] Phil Yang commented on HBASE-14790: --- Considering these features: Hflush is much faster than hsync, especially in pipeline mode. So we have to use hflush for hbase writing. The data in DN that is hflushed but not hsynced may only in memory not disk, but it can be read by client. So if we hflush data to DNs, and it is read by ReplicationSource and transferred to slave cluster, then three DNs and RS in master cluster crash. And after replaying WALs, slave will have data that master loses... The only way to prevent any data losses is hsync every time but it is too slow, and I think most users can bear data lose to speed up writing operation but can not bear slave has more data than master. Therefore, I think we can do these: hflush every time, not fsync; hfsync periodically, for example, default per 1000ms? It can be configured by users, and users can also configure that we hfsync each time, so there will not have any data loses unless all DNs disk fail... RS tells "acked length" to ReplicationSource which is the data we hsynced, not hflushed. ReplicationSource only transfer data which is not larger than acked length. So the slave cluster will never have inconsistency. WAL reading can handle duplicate entries. On WAL logging, if we get error on hflush, we open a new file and rewrite this entry, and recover/hsync/close old file asynchronously. > Implement a new DFSOutputStream for logging WAL only > > > Key: HBASE-14790 > URL: https://issues.apache.org/jira/browse/HBASE-14790 > Project: HBase > Issue Type: Improvement >Reporter: Duo Zhang > > The original {{DFSOutputStream}} is very powerful and aims to serve all > purposes. But in fact, we do not need most of the features if we only want to > log WAL. For example, we do not need pipeline recovery since we could just > close the old logger and open a new one. And also, we do not need to write > multiple blocks since we could also open a new logger if the old file is too > large. > And the most important thing is that, it is hard to handle all the corner > cases to avoid data loss or data inconsistency(such as HBASE-14004) when > using original DFSOutputStream due to its complicated logic. And the > complicated logic also force us to use some magical tricks to increase > performance. For example, we need to use multiple threads to call {{hflush}} > when logging, and now we use 5 threads. But why 5 not 10 or 100? > So here, I propose we should implement our own {{DFSOutputStream}} when > logging WAL. For correctness, and also for performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only
[ https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15040781#comment-15040781 ] Heng Chen edited comment on HBASE-14790 at 12/4/15 6:20 AM: {quote} And for keeping an "acked length", I think we could make use of the fsync method in HDFS. We could call fsync asynchronously to update length on namenode. The replication source should not read beyond the length gotten from namenode(do not trust the visible length read from datanode). {quote} So if we can not avoid fsync every time, maybe this way [~Apache9] mentioned is the best solution? Of course, we should keep 'acked length' in RS. Let's begin? was (Author: chenheng): {quote} And for keeping an "acked length", I think we could make use of the fsync method in HDFS. We could call fsync asynchronously to update length on namenode. The replication source should not read beyond the length gotten from namenode(do not trust the visible length read from datanode). {quote} So if we can not avoid fsync every time, maybe this way [~Apache9] mentioned is the best solution? Shall we begin? > Implement a new DFSOutputStream for logging WAL only > > > Key: HBASE-14790 > URL: https://issues.apache.org/jira/browse/HBASE-14790 > Project: HBase > Issue Type: Improvement >Reporter: Duo Zhang > > The original {{DFSOutputStream}} is very powerful and aims to serve all > purposes. But in fact, we do not need most of the features if we only want to > log WAL. For example, we do not need pipeline recovery since we could just > close the old logger and open a new one. And also, we do not need to write > multiple blocks since we could also open a new logger if the old file is too > large. > And the most important thing is that, it is hard to handle all the corner > cases to avoid data loss or data inconsistency(such as HBASE-14004) when > using original DFSOutputStream due to its complicated logic. And the > complicated logic also force us to use some magical tricks to increase > performance. For example, we need to use multiple threads to call {{hflush}} > when logging, and now we use 5 threads. But why 5 not 10 or 100? > So here, I propose we should implement our own {{DFSOutputStream}} when > logging WAL. For correctness, and also for performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14869) Better request latency histograms
[ https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039678#comment-15039678 ] Vikas Vishwakarma commented on HBASE-14869: --- [~apurtell] thanks for the review. We do not have splunk forwarders for the test env but we already have daily automation scripts running on production logs extracting operation latencies from periodic hbase metrics dump like Mutate_mean, Mutate_95th_percentile. Since this is just addition to the above metric list, we can easily get these metrics also using the same script. However I have tested this only locally on dev setup but will set this up on a full cluster and run some long running and high load tests to check for perf impact, cpu usage etc and update the test results. Sounds ok? If the naming convention, range values used for these metrics needs to be changed, I can do the same based on suggestion and update the patch. > Better request latency histograms > - > > Key: HBASE-14869 > URL: https://issues.apache.org/jira/browse/HBASE-14869 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Assignee: Vikas Vishwakarma > Fix For: 2.0.0, 1.3.0, 0.98.17 > > Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, > 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, > 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png > > > I just discussed this with a colleague. > The get, put, etc, histograms that each region server keeps are somewhat > useless (depending on what you want to achieve of course), as they are > aggregated and calculated by each region server. > It would be better to record the number of requests in certainly latency > bands in addition to what we do now. > For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, > 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be > configurable). > That way we can do further calculations after the fact, and answer questions > like: How often did we miss our SLA? Percentage of requests that missed an > SLA, etc. > Comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14772) Improve zombie detector; be more discerning
[ https://issues.apache.org/jira/browse/HBASE-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039695#comment-15039695 ] Hudson commented on HBASE-14772: FAILURE: Integrated in HBase-Trunk_matrix #529 (See [https://builds.apache.org/job/HBase-Trunk_matrix/529/]) HBASE-14772 Improve zombie detector; be more discerning; part2; (stack: rev 5e430837d3e4a7d159e84964357297c8ab42430d) * dev-support/test-patch.sh * dev-support/zombie-detector.sh HBASE-14772 Improve zombie detector; be more discerning; part2; (stack: rev 7117a2e35d42ef4e3f17b0a8f891fc5200cd0890) * dev-support/zombie-detector.sh > Improve zombie detector; be more discerning > --- > > Key: HBASE-14772 > URL: https://issues.apache.org/jira/browse/HBASE-14772 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0 > > Attachments: 14772v3.patch, zombie.patch, zombiev2.patch > > > Currently, any surefire process with the hbase flag is a potential zombie. > Our zombie check currently takes a reading and if it finds candidate zombies, > it waits 30 seconds and then does another reading. If a concurrent build > going on, in both cases the zombie detector will come up positive though the > adjacent test run may be making progress; i.e. the cast of surefire processes > may have changed between readings but our detector just sees presence of > hbase surefire processes. > Here is example: > {code} > Suspicious java process found - waiting 30s to see if there are just slow to > stop > There appear to be 5 zombie tests, they should have been killed by surefire > but survived > 12823 surefirebooter852180186418035480.jar -enableassertions -Dhbase.test > -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom > -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true > 7653 surefirebooter8579074445899448699.jar -enableassertions -Dhbase.test > -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom > -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true > 12614 surefirebooter136529596936417090.jar -enableassertions -Dhbase.test > -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom > -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true > 7836 surefirebooter3217047564606450448.jar -enableassertions -Dhbase.test > -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom > -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true > 13566 surefirebooter2084039411151963494.jar -enableassertions -Dhbase.test > -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom > -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true > BEGIN zombies jstack extract > END zombies jstack extract > {code} > 5 is the number of forked processes we allow when doing medium and large > tests so an adjacent build will always show as '5 zombies'. > Need to add discerning if list of processes changes between readings. > Can I also add a tag per build run that all forked processes pick up so I can > look at the current builds progeny only? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14866) VerifyReplication should use peer configuration in peer connection
[ https://issues.apache.org/jira/browse/HBASE-14866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039694#comment-15039694 ] Heng Chen commented on HBASE-14866: --- {quote} This could even be ZKConfig, moved from hbase-client, since it's private. It would be an expansion of it's current reponsibilities, but doesn't seem too bad. {quote} I like this idea. move ZKConfig into hbase-common sounds more reasonable. > VerifyReplication should use peer configuration in peer connection > -- > > Key: HBASE-14866 > URL: https://issues.apache.org/jira/browse/HBASE-14866 > Project: HBase > Issue Type: Improvement > Components: Replication >Reporter: Gary Helmling >Assignee: Gary Helmling > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14866.patch, HBASE-14866_v1.patch, > hbase-14866-v4.patch, hbase-14866_v2.patch, hbase-14866_v3.patch > > > VerifyReplication uses the replication peer's configuration to construct the > ZooKeeper quorum address for the peer connection. However, other > configuration properties in the peer's configuration are dropped. It should > merge all configuration properties from the {{ReplicationPeerConfig}} when > creating the peer connection and obtaining a credentials for the peer cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14223) Meta WALs are not cleared if meta region was closed and RS aborts
[ https://issues.apache.org/jira/browse/HBASE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039696#comment-15039696 ] Hudson commented on HBASE-14223: FAILURE: Integrated in HBase-Trunk_matrix #529 (See [https://builds.apache.org/job/HBase-Trunk_matrix/529/]) Revert "HBASE-14223 Meta WALs are not cleared if meta region was closed (enis: rev bbd53b846ef6d78740f54f5cea3c73bd992dde09) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseMetaHandler.java * hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALFactory.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * hbase-server/src/test/java/org/apache/hadoop/hbase/MockRegionServerServices.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/MockRegionServer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java > Meta WALs are not cleared if meta region was closed and RS aborts > - > > Key: HBASE-14223 > URL: https://issues.apache.org/jira/browse/HBASE-14223 > Project: HBase > Issue Type: Bug >Reporter: Enis Soztutar >Assignee: Enis Soztutar > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.4, 1.0.4 > > Attachments: HBASE-14223logs, hbase-14223_v0.patch, > hbase-14223_v1-branch-1.patch, hbase-14223_v2-branch-1.patch, > hbase-14223_v3-branch-1.patch, hbase-14223_v3-branch-1.patch, > hbase-14223_v3-master.patch > > > When an RS opens meta, and later closes it, the WAL(FSHlog) is not closed. > The last WAL file just sits there in the RS WAL directory. If RS stops > gracefully, the WAL file for meta is deleted. Otherwise if RS aborts, WAL for > meta is not cleaned. It is also not split (which is correct) since master > determines that the RS no longer hosts meta at the time of RS abort. > From a cluster after running ITBLL with CM, I see a lot of {{-splitting}} > directories left uncleaned: > {code} > [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls > /apps/hbase/data/WALs > Found 31 items > drwxr-xr-x - hbase hadoop 0 2015-06-05 01:14 > /apps/hbase/data/WALs/hregion-58203265 > drwxr-xr-x - hbase hadoop 0 2015-06-05 07:54 > /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433489308745-splitting > drwxr-xr-x - hbase hadoop 0 2015-06-05 09:28 > /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433494382959-splitting > drwxr-xr-x - hbase hadoop 0 2015-06-05 10:01 > /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433498252205-splitting > ... > {code} > The directories contain WALs from meta: > {code} > [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls > /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting > Found 2 items > -rw-r--r-- 3 hbase hadoop 201608 2015-06-05 03:15 > /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta > -rw-r--r-- 3 hbase hadoop 44420 2015-06-05 04:36 > /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta > {code} > The RS hosted the meta region for some time: > {code} > 2015-06-05 03:14:28,692 INFO [PostOpenDeployTasks:1588230740] > zookeeper.MetaTableLocator: Setting hbase:meta region location in ZooKeeper > as os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285 > ... > 2015-06-05 03:15:17,302 INFO > [RS_CLOSE_META-os-enis-dal-test-jun-4-5:16020-0] regionserver.HRegion: Closed > hbase:meta,,1.1588230740 > {code} > In between, a WAL is created: > {code} > 2015-06-05 03:15:11,707 INFO > [RS_OPEN_META-os-enis-dal-test-jun-4-5:16020-0-MetaLogRoller] wal.FSHLog: > Rolled WAL > /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta > with entries=385, filesize=196.88 KB; new WAL > /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta > {code} > When CM killed the region server later master did not see these WAL files: > {code} > ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:46,075 > INFO
[jira] [Commented] (HBASE-14905) VerifyReplication does not honour versions option
[ https://issues.apache.org/jira/browse/HBASE-14905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039697#comment-15039697 ] Hudson commented on HBASE-14905: FAILURE: Integrated in HBase-Trunk_matrix #529 (See [https://builds.apache.org/job/HBase-Trunk_matrix/529/]) HBASE-14905 VerifyReplication does not honour versions option (Vishal (tedyu: rev 67ba6598b1be167409a31c4e210b7218823b7beb) * hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSmallTests.java * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java * hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationBase.java > VerifyReplication does not honour versions option > - > > Key: HBASE-14905 > URL: https://issues.apache.org/jira/browse/HBASE-14905 > Project: HBase > Issue Type: Bug > Components: tooling >Affects Versions: 2.0.0, 0.98.16 >Reporter: Vishal Khandelwal >Assignee: Vishal Khandelwal > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14905-v2.txt, HBASE-14905.patch, HBASE-14905_v3.patch, > HBASE-14905_v4.patch, test.patch > > > source: > hbase(main):001:0> scan 't1', {RAW => true, VERSIONS => 100} > ROW COLUMN+CELL > > > r1 column=f1:, timestamp=1449030102091, > value=value1112 > > r1 column=f1:, timestamp=1449029774173, > value=value1001 > > r1 column=f1:, timestamp=1449029709974, > value=value1002 > > target: > hbase(main):023:0> scan 't1', {RAW => true, VERSIONS => 100} > ROW COLUMN+CELL > > > r1 column=f1:, timestamp=1449030102091, > value=value1112 > > r1 column=f1:, timestamp=1449030090758, > value=value1112 > > r1 column=f1:, timestamp=1449029984282, > value=value > > r1 column=f1:, timestamp=1449029774173, > value=value1001 > > r1 column=f1:, timestamp=1449029709974, > value=value1002 > /bin/hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication > --versions=100 1 t1 > org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier$Counters > GOODROWS=1 > Does not show any mismatch. Ideally it should show. This is because in > VerifyReplication Class maxVersion is not correctly set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039729#comment-15039729 ] Zhan Zhang commented on HBASE-14795: Sure. I cannot submit review in review board, and will consult other people how to do this. > Enhance the spark-hbase scan operations > --- > > Key: HBASE-14795 > URL: https://issues.apache.org/jira/browse/HBASE-14795 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > Attachments: > 0001-HBASE-14795-Enhance-the-spark-hbase-scan-operations.patch > > > This is a sub-jira of HBASE-14789. This jira is to focus on the replacement > of TableInputFormat for a more custom scan implementation that will make the > following use case more effective. > Use case: > In the case you have multiple scan ranges on a single table with in a single > query. TableInputFormat will scan the the outer range of the scan start and > end range where this implementation can be more pointed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14869) Better request latency histograms
[ https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15040770#comment-15040770 ] Vikas Vishwakarma commented on HBASE-14869: --- [~lhofhansl] I looked at splunk where we have GC logs indexed with statements like below that also include the "greater than" symbol for before GC after GC ParNew: 218868K->9270K(235968K), 0.0077550 secs] 255143K->45545K(1520064K) Ran rex queries to parse it and verified it works fine, it was able to extract the proper field so that should be ok splunk query: "logline" | rex "->(?[^(]+)" | table _time to_gc > Better request latency histograms > - > > Key: HBASE-14869 > URL: https://issues.apache.org/jira/browse/HBASE-14869 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Assignee: Vikas Vishwakarma > Fix For: 2.0.0, 1.3.0, 0.98.17 > > Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, > 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, > 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png > > > I just discussed this with a colleague. > The get, put, etc, histograms that each region server keeps are somewhat > useless (depending on what you want to achieve of course), as they are > aggregated and calculated by each region server. > It would be better to record the number of requests in certainly latency > bands in addition to what we do now. > For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, > 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be > configurable). > That way we can do further calculations after the fact, and answer questions > like: How often did we miss our SLA? Percentage of requests that missed an > SLA, etc. > Comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only
[ https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15040809#comment-15040809 ] Duo Zhang commented on HBASE-14790: --- [~chenheng] We should make a trade off here. I do not think calling fsync every time is acceptable since it means namenode will have the same write pressure with the whole HBase cluster... > Implement a new DFSOutputStream for logging WAL only > > > Key: HBASE-14790 > URL: https://issues.apache.org/jira/browse/HBASE-14790 > Project: HBase > Issue Type: Improvement >Reporter: Duo Zhang > > The original {{DFSOutputStream}} is very powerful and aims to serve all > purposes. But in fact, we do not need most of the features if we only want to > log WAL. For example, we do not need pipeline recovery since we could just > close the old logger and open a new one. And also, we do not need to write > multiple blocks since we could also open a new logger if the old file is too > large. > And the most important thing is that, it is hard to handle all the corner > cases to avoid data loss or data inconsistency(such as HBASE-14004) when > using original DFSOutputStream due to its complicated logic. And the > complicated logic also force us to use some magical tricks to increase > performance. For example, we need to use multiple threads to call {{hflush}} > when logging, and now we use 5 threads. But why 5 not 10 or 100? > So here, I propose we should implement our own {{DFSOutputStream}} when > logging WAL. For correctness, and also for performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14904) Mark Base[En|De]coder LimitedPrivate and fix binary compat issue
[ https://issues.apache.org/jira/browse/HBASE-14904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15041229#comment-15041229 ] Hudson commented on HBASE-14904: FAILURE: Integrated in HBase-1.1-JDK7 #1612 (See [https://builds.apache.org/job/HBase-1.1-JDK7/1612/]) HBASE-14904 Mark Base[En|De]coder LimitedPrivate and fix binary compat (enis: rev f3d3bd9d3b8eca176166391ef078391816b34bed) * hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALPrettyPrinter.java * hbase-common/src/main/java/org/apache/hadoop/hbase/codec/BaseEncoder.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java * hbase-common/src/main/java/org/apache/hadoop/hbase/codec/BaseDecoder.java > Mark Base[En|De]coder LimitedPrivate and fix binary compat issue > > > Key: HBASE-14904 > URL: https://issues.apache.org/jira/browse/HBASE-14904 > Project: HBase > Issue Type: Bug >Reporter: Enis Soztutar >Assignee: Enis Soztutar > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: hbase-14904_v1.patch, hbase-14904_v2.patch > > > PHOENIX-2477 revealed that the changes from HBASE-14501 breaks binary > compatibility in Phoenix compiled with earlier versions of HBase and run > agains later versions. > This is one of the areas that the boundary is not clear, but it won't hurt us > to fix it. > The exception trace is: > {code} > Exception in thread "main" java.lang.NoSuchFieldError: in > at > org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec$PhoenixBaseDecoder.(IndexedWALEditCodec.java:106) > at > org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec$IndexKeyValueDecoder.(IndexedWALEditCodec.java:121) > at > org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec.getDecoder(IndexedWALEditCodec.java:63) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initAfterCompression(ProtobufLogReader.java:292) > at > org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:82) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:148) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:316) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:281) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:269) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:418) > at > org.apache.hadoop.hbase.wal.WALPrettyPrinter.processFile(WALPrettyPrinter.java:247) > at > org.apache.hadoop.hbase.wal.WALPrettyPrinter.run(WALPrettyPrinter.java:422) > at > org.apache.hadoop.hbase.wal.WALPrettyPrinter.main(WALPrettyPrinter.java:357) > {code} > Although {{BaseDecoder.in}} is still there, it got changed to be a class > rather than an interface. BaseDecoder is marked Private, thus the binary > compat check is not run at all. Not sure whether it would have caught this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14905) VerifyReplication does not honour versions option
[ https://issues.apache.org/jira/browse/HBASE-14905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Khandelwal updated HBASE-14905: -- Attachment: HBASE-14905_v4.patch > VerifyReplication does not honour versions option > - > > Key: HBASE-14905 > URL: https://issues.apache.org/jira/browse/HBASE-14905 > Project: HBase > Issue Type: Bug > Components: tooling >Affects Versions: 2.0.0, 0.98.16 >Reporter: Vishal Khandelwal >Assignee: Vishal Khandelwal > Fix For: 2.0.0 > > Attachments: 14905-v2.txt, HBASE-14905.patch, HBASE-14905_v3.patch, > HBASE-14905_v4.patch, test.patch > > > source: > hbase(main):001:0> scan 't1', {RAW => true, VERSIONS => 100} > ROW COLUMN+CELL > > > r1 column=f1:, timestamp=1449030102091, > value=value1112 > > r1 column=f1:, timestamp=1449029774173, > value=value1001 > > r1 column=f1:, timestamp=1449029709974, > value=value1002 > > target: > hbase(main):023:0> scan 't1', {RAW => true, VERSIONS => 100} > ROW COLUMN+CELL > > > r1 column=f1:, timestamp=1449030102091, > value=value1112 > > r1 column=f1:, timestamp=1449030090758, > value=value1112 > > r1 column=f1:, timestamp=1449029984282, > value=value > > r1 column=f1:, timestamp=1449029774173, > value=value1001 > > r1 column=f1:, timestamp=1449029709974, > value=value1002 > /bin/hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication > --versions=100 1 t1 > org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier$Counters > GOODROWS=1 > Does not show any mismatch. Ideally it should show. This is because in > VerifyReplication Class maxVersion is not correctly set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14923) VerifyReplication should not mask the exception during result comaprision
[ https://issues.apache.org/jira/browse/HBASE-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Khandelwal updated HBASE-14923: -- Status: Patch Available (was: Open) > VerifyReplication should not mask the exception during result comaprision > -- > > Key: HBASE-14923 > URL: https://issues.apache.org/jira/browse/HBASE-14923 > Project: HBase > Issue Type: Bug > Components: tooling >Affects Versions: 0.98.16, 2.0.0 >Reporter: Vishal Khandelwal >Assignee: Vishal Khandelwal >Priority: Minor > Fix For: 2.0.0, 0.98.16 > > > hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java > Line:154 > } catch (Exception e) { > logFailRowAndIncreaseCounter(context, > Counters.CONTENT_DIFFERENT_ROWS, value); > } > Just LOG.error needs to be added for more information for the failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14923) VerifyReplication should not mask the exception during result comaprision
Vishal Khandelwal created HBASE-14923: - Summary: VerifyReplication should not mask the exception during result comaprision Key: HBASE-14923 URL: https://issues.apache.org/jira/browse/HBASE-14923 Project: HBase Issue Type: Bug Components: tooling Affects Versions: 0.98.16, 2.0.0 Reporter: Vishal Khandelwal Assignee: Vishal Khandelwal Priority: Minor Fix For: 2.0.0, 0.98.16 hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java Line:154 } catch (Exception e) { logFailRowAndIncreaseCounter(context, Counters.CONTENT_DIFFERENT_ROWS, value); } Just LOG.error needs to be added for more information for the failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037731#comment-15037731 ] Hadoop QA commented on HBASE-14906: --- {color:green}+1 overall{color}. {color:green}+1 core zombie tests -- no zombies!{color}. Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16750//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16750//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16750//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16750//console This message is automatically generated. > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, > HBASE-14906.v3.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14905) VerifyReplication does not honour versions option
[ https://issues.apache.org/jira/browse/HBASE-14905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037692#comment-15037692 ] Vishal Khandelwal commented on HBASE-14905: --- [~ted_yu] and [~chenheng] : added patch with another test alongwith test provided by [~chenheng]. Please review the changes. Thanks [~chenheng] for incorporating the log comment. > VerifyReplication does not honour versions option > - > > Key: HBASE-14905 > URL: https://issues.apache.org/jira/browse/HBASE-14905 > Project: HBase > Issue Type: Bug > Components: tooling >Affects Versions: 2.0.0, 0.98.16 >Reporter: Vishal Khandelwal >Assignee: Vishal Khandelwal > Fix For: 2.0.0 > > Attachments: 14905-v2.txt, HBASE-14905.patch, HBASE-14905_v3.patch, > HBASE-14905_v4.patch, test.patch > > > source: > hbase(main):001:0> scan 't1', {RAW => true, VERSIONS => 100} > ROW COLUMN+CELL > > > r1 column=f1:, timestamp=1449030102091, > value=value1112 > > r1 column=f1:, timestamp=1449029774173, > value=value1001 > > r1 column=f1:, timestamp=1449029709974, > value=value1002 > > target: > hbase(main):023:0> scan 't1', {RAW => true, VERSIONS => 100} > ROW COLUMN+CELL > > > r1 column=f1:, timestamp=1449030102091, > value=value1112 > > r1 column=f1:, timestamp=1449030090758, > value=value1112 > > r1 column=f1:, timestamp=1449029984282, > value=value > > r1 column=f1:, timestamp=1449029774173, > value=value1001 > > r1 column=f1:, timestamp=1449029709974, > value=value1002 > /bin/hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication > --versions=100 1 t1 > org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier$Counters > GOODROWS=1 > Does not show any mismatch. Ideally it should show. This is because in > VerifyReplication Class maxVersion is not correctly set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14923) VerifyReplication should not mask the exception during result comaprision
[ https://issues.apache.org/jira/browse/HBASE-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Khandelwal updated HBASE-14923: -- Attachment: HBASE-14923_v1.patch > VerifyReplication should not mask the exception during result comaprision > -- > > Key: HBASE-14923 > URL: https://issues.apache.org/jira/browse/HBASE-14923 > Project: HBase > Issue Type: Bug > Components: tooling >Affects Versions: 2.0.0, 0.98.16 >Reporter: Vishal Khandelwal >Assignee: Vishal Khandelwal >Priority: Minor > Fix For: 2.0.0, 0.98.16 > > Attachments: HBASE-14923_v1.patch > > > hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java > Line:154 > } catch (Exception e) { > logFailRowAndIncreaseCounter(context, > Counters.CONTENT_DIFFERENT_ROWS, value); > } > Just LOG.error needs to be added for more information for the failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14923) VerifyReplication should not mask the exception during result comparison
[ https://issues.apache.org/jira/browse/HBASE-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Khandelwal updated HBASE-14923: -- Summary: VerifyReplication should not mask the exception during result comparison (was: VerifyReplication should not mask the exception during result comaprision ) > VerifyReplication should not mask the exception during result comparison > - > > Key: HBASE-14923 > URL: https://issues.apache.org/jira/browse/HBASE-14923 > Project: HBase > Issue Type: Bug > Components: tooling >Affects Versions: 2.0.0, 0.98.16 >Reporter: Vishal Khandelwal >Assignee: Vishal Khandelwal >Priority: Minor > Fix For: 2.0.0, 0.98.16 > > Attachments: HBASE-14923_v1.patch > > > hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java > Line:154 > } catch (Exception e) { > logFailRowAndIncreaseCounter(context, > Counters.CONTENT_DIFFERENT_ROWS, value); > } > Just LOG.error needs to be added for more information for the failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14869) Better request latency histograms
[ https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikas Vishwakarma updated HBASE-14869: -- Attachment: 14869-v5-0.98.txt > Better request latency histograms > - > > Key: HBASE-14869 > URL: https://issues.apache.org/jira/browse/HBASE-14869 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Assignee: Vikas Vishwakarma > Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, > 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v3-0.98.txt, 14869-v4-0.98.txt, > 14869-v5-0.98.txt, AppendSizeTime.png, Get.png > > > I just discussed this with a colleague. > The get, put, etc, histograms that each region server keeps are somewhat > useless (depending on what you want to achieve of course), as they are > aggregated and calculated by each region server. > It would be better to record the number of requests in certainly latency > bands in addition to what we do now. > For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, > 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be > configurable). > That way we can do further calculations after the fact, and answer questions > like: How often did we miss our SLA? Percentage of requests that missed an > SLA, etc. > Comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14895) Seek only to the newly flushed file on scanner reset on flush
[ https://issues.apache.org/jira/browse/HBASE-14895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037735#comment-15037735 ] ramkrishna.s.vasudevan commented on HBASE-14895: I have patch ready for this. Once HBASE-13082 is checked in will rebase the patch on top of that. Found some interesting things while doing this wrt to the new shipped() call that we make. > Seek only to the newly flushed file on scanner reset on flush > - > > Key: HBASE-14895 > URL: https://issues.apache.org/jira/browse/HBASE-14895 > Project: HBase > Issue Type: Sub-task >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14869) Better request latency histograms
[ https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikas Vishwakarma updated HBASE-14869: -- Attachment: 14869-v2-2.0.txt > Better request latency histograms > - > > Key: HBASE-14869 > URL: https://issues.apache.org/jira/browse/HBASE-14869 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Assignee: Vikas Vishwakarma > Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, > 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, > 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png > > > I just discussed this with a colleague. > The get, put, etc, histograms that each region server keeps are somewhat > useless (depending on what you want to achieve of course), as they are > aggregated and calculated by each region server. > It would be better to record the number of requests in certainly latency > bands in addition to what we do now. > For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, > 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be > configurable). > That way we can do further calculations after the fact, and answer questions > like: How often did we miss our SLA? Percentage of requests that missed an > SLA, etc. > Comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14869) Better request latency histograms
[ https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037746#comment-15037746 ] Vikas Vishwakarma commented on HBASE-14869: --- the core test failure does not look related it shows the following issue "java.net.BindException: Address already in use" Fixed the lineLengths issue and added unit test in the attached patch > Better request latency histograms > - > > Key: HBASE-14869 > URL: https://issues.apache.org/jira/browse/HBASE-14869 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Assignee: Vikas Vishwakarma > Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, > 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, > 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png > > > I just discussed this with a colleague. > The get, put, etc, histograms that each region server keeps are somewhat > useless (depending on what you want to achieve of course), as they are > aggregated and calculated by each region server. > It would be better to record the number of requests in certainly latency > bands in addition to what we do now. > For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, > 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be > configurable). > That way we can do further calculations after the fact, and answer questions > like: How often did we miss our SLA? Percentage of requests that missed an > SLA, etc. > Comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14915) Hanging test : org.apache.hadoop.hbase.mapreduce.TestImportExport
[ https://issues.apache.org/jira/browse/HBASE-14915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heng Chen updated HBASE-14915: -- Attachment: HBASE-14915-branch-1.2.patch Try this patch. [~stack] :) We should wait puts complete before we check data. > Hanging test : org.apache.hadoop.hbase.mapreduce.TestImportExport > - > > Key: HBASE-14915 > URL: https://issues.apache.org/jira/browse/HBASE-14915 > Project: HBase > Issue Type: Sub-task > Components: hangingTests >Reporter: stack > Attachments: HBASE-14915-branch-1.2.patch > > > This test hangs a bunch: > Here is latest: > https://builds.apache.org/job/HBase-1.2/418/jdk=latest1.7,label=Hadoop/consoleText -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14918) In-Memory MemStore Flush and Compaction
Eshcar Hillel created HBASE-14918: - Summary: In-Memory MemStore Flush and Compaction Key: HBASE-14918 URL: https://issues.apache.org/jira/browse/HBASE-14918 Project: HBase Issue Type: Umbrella Affects Versions: 2.0.0 Reporter: Eshcar Hillel A memstore serves as the in-memory component of a store unit, absorbing all updates to the store. From time to time these updates are flushed to a file on disk, where they are compacted (by eliminating redundancies) and compressed (i.e., written in a compressed format to reduce their storage size). We aim to speed up data access, and therefore suggest to apply in-memory memstore flush. That is to flush the active in-memory segment into an intermediate buffer where it can be accessed by the application. Data in the buffer is subject to compaction and can be stored in any format that allows it to take up smaller space in RAM. The less space the buffer consumes the longer it can reside in memory before data is flushed to disk, resulting in better performance. Specifically, the optimization is beneficial for workloads with medium-to-high key churn which incur many redundant cells, like persistent messaging. We suggest to structure the solution as 3 subtasks (respectively, patches). (1) Infrastructure - refactoring of the MemStore hierarchy, introducing segment (StoreSegment) as first-class citizen, and decoupling memstore scanner from the memstore implementation; (2) Implementation of a new memstore (CompactingMemstore) with non-optimized immutable segment representation, and (3) Memory optimization including compressed format representation and offheap allocations. This Jira continues the discussion in HBASE-13408. Design documents, evaluation results and previous patches can be found in HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14921) Memory optimizations
Eshcar Hillel created HBASE-14921: - Summary: Memory optimizations Key: HBASE-14921 URL: https://issues.apache.org/jira/browse/HBASE-14921 Project: HBase Issue Type: Sub-task Affects Versions: 2.0.0 Reporter: Eshcar Hillel Memory optimizations including compressed format representation and offheap allocations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14920) Compacting Memstore
Eshcar Hillel created HBASE-14920: - Summary: Compacting Memstore Key: HBASE-14920 URL: https://issues.apache.org/jira/browse/HBASE-14920 Project: HBase Issue Type: Sub-task Reporter: Eshcar Hillel Assignee: Eshcar Hillel Implementation of a new compacting memstore with non-optimized immutable segment representation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14772) Improve zombie detector; be more discerning
[ https://issues.apache.org/jira/browse/HBASE-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037629#comment-15037629 ] Hudson commented on HBASE-14772: FAILURE: Integrated in HBase-Trunk_matrix #527 (See [https://builds.apache.org/job/HBase-Trunk_matrix/527/]) HBASE-14772 Improve zombie detector; be more discerning; part2; (stack: rev 69658ea4a916c8ea5e6dd7d056a548e8dce4e96d) * dev-support/test-patch.sh > Improve zombie detector; be more discerning > --- > > Key: HBASE-14772 > URL: https://issues.apache.org/jira/browse/HBASE-14772 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Attachments: 14772v3.patch, zombie.patch, zombiev2.patch > > > Currently, any surefire process with the hbase flag is a potential zombie. > Our zombie check currently takes a reading and if it finds candidate zombies, > it waits 30 seconds and then does another reading. If a concurrent build > going on, in both cases the zombie detector will come up positive though the > adjacent test run may be making progress; i.e. the cast of surefire processes > may have changed between readings but our detector just sees presence of > hbase surefire processes. > Here is example: > {code} > Suspicious java process found - waiting 30s to see if there are just slow to > stop > There appear to be 5 zombie tests, they should have been killed by surefire > but survived > 12823 surefirebooter852180186418035480.jar -enableassertions -Dhbase.test > -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom > -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true > 7653 surefirebooter8579074445899448699.jar -enableassertions -Dhbase.test > -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom > -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true > 12614 surefirebooter136529596936417090.jar -enableassertions -Dhbase.test > -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom > -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true > 7836 surefirebooter3217047564606450448.jar -enableassertions -Dhbase.test > -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom > -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true > 13566 surefirebooter2084039411151963494.jar -enableassertions -Dhbase.test > -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom > -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true > BEGIN zombies jstack extract > END zombies jstack extract > {code} > 5 is the number of forked processes we allow when doing medium and large > tests so an adjacent build will always show as '5 zombies'. > Need to add discerning if list of processes changes between readings. > Can I also add a tag per build run that all forked processes pick up so I can > look at the current builds progeny only? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14749) Make changes to region_mover.rb to use RegionMover Java tool
[ https://issues.apache.org/jira/browse/HBASE-14749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037455#comment-15037455 ] Abhishek Singh Chouhan commented on HBASE-14749: [~stack] [~apurtell] Should i backport these changes to branch-1 and 0.98? > Make changes to region_mover.rb to use RegionMover Java tool > > > Key: HBASE-14749 > URL: https://issues.apache.org/jira/browse/HBASE-14749 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan > Fix For: 2.0.0 > > Attachments: HBASE-14749-v2.patch, HBASE-14749-v3.patch, > HBASE-14749-v3.patch, HBASE-14749-v4.patch, HBASE-14749-v5.patch, > HBASE-14749.patch, HBASE-14749.patch > > > With HBASE-13014 in, we can now replace the ruby script such that it invokes > the Java Tool. Also expose timeout and no-ack mode which were added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14917) Log in console if individual tests in test-patch.sh fail or pass.
[ https://issues.apache.org/jira/browse/HBASE-14917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037507#comment-15037507 ] Hadoop QA commented on HBASE-14917: --- {color:green}+1 overall{color}. {color:green}+1 core zombie tests -- no zombies!{color}. Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16748//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16748//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16748//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16748//console This message is automatically generated. > Log in console if individual tests in test-patch.sh fail or pass. > - > > Key: HBASE-14917 > URL: https://issues.apache.org/jira/browse/HBASE-14917 > Project: HBase > Issue Type: Bug >Reporter: Appy >Assignee: Appy >Priority: Minor > Attachments: HBASE-14917.patch > > > Got 2 runs like > https://issues.apache.org/jira/browse/HBASE-14865?focusedCommentId=15037056=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15037056 > where can't figure out what went wrong in patch testing. > Logging results from individual tests to console as discussed > [here|https://mail-archives.apache.org/mod_mbox/hbase-dev/201512.mbox/%3CCAAjhxrrL4-qty562%3DcMyBJ2xyhGqHi3MFAgf9ygrzQf1%2BZmHtw%40mail.gmail.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HBASE-14906: -- Attachment: HBASE-14906.v3.patch Update patch to resolve UT failure, and retry HadoopQA > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, > HBASE-14906.v3.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14916) Add checkstyle_report.py to other branches
[ https://issues.apache.org/jira/browse/HBASE-14916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037457#comment-15037457 ] Hadoop QA commented on HBASE-14916: --- {color:red}-1 overall{color}. Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16747//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16747//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16747//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16747//console This message is automatically generated. > Add checkstyle_report.py to other branches > -- > > Key: HBASE-14916 > URL: https://issues.apache.org/jira/browse/HBASE-14916 > Project: HBase > Issue Type: Bug >Reporter: Appy >Assignee: Appy > Attachments: HBASE-14916-branch-1.patch > > > Given test-patch.sh is always run from master, and that it now uses > checkstyle_report.py, we should pull back the script to other branches too. > Otherwise we see error like: > /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/jenkins.build/dev-support/test-patch.sh: > line 662: > /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase/dev-support/checkstyle_report.py: > No such file or directory > [reference|https://builds.apache.org/job/PreCommit-HBASE-Build/16734//consoleFull] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14919) Infrastructure refactoring
Eshcar Hillel created HBASE-14919: - Summary: Infrastructure refactoring Key: HBASE-14919 URL: https://issues.apache.org/jira/browse/HBASE-14919 Project: HBase Issue Type: Sub-task Affects Versions: 2.0.0 Reporter: Eshcar Hillel Assignee: Eshcar Hillel Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as first-class citizen and decoupling memstore scanner from the memstore implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only
[ https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037472#comment-15037472 ] Duo Zhang commented on HBASE-14790: --- Oh, I think we could not fix HBASE-14004 without changing the replication module of HBase. No matter how we implement DFSOutputStream, think of this scenario: 1. rs flush an WAL entry to dn1, dn2 and dn3. 2. dn1 received the WAL entry, and it is read by ReplicationSource and replicated to slave cluster. 3. dn1 and rs both crash, dn2 and dn3 has not received this WAL entry yet, and rs has not bumped the GS of this block yet. 4. NameNode complete the file with a length that does not contains this WAL entry since the GS of blocks on dn2 and dn3 is correct and NameNode does not know there used to be a block with longer length. 5. whoops... So I think every rs should keep an "acked length" of the current writing WAL file, an when doing replication, ReplicationSource should ask this length first before reading and do not read beyond it. If we have this logic, then the implementation of the new "DFSOutputStream" is much simpler. We could just truncate the file if writing WAL failed on some datanode with our "acked length" and fail all the entries after the "acked length". This can keep all things consistency. Thanks. > Implement a new DFSOutputStream for logging WAL only > > > Key: HBASE-14790 > URL: https://issues.apache.org/jira/browse/HBASE-14790 > Project: HBase > Issue Type: Improvement >Reporter: Duo Zhang > > The original {{DFSOutputStream}} is very powerful and aims to serve all > purposes. But in fact, we do not need most of the features if we only want to > log WAL. For example, we do not need pipeline recovery since we could just > close the old logger and open a new one. And also, we do not need to write > multiple blocks since we could also open a new logger if the old file is too > large. > And the most important thing is that, it is hard to handle all the corner > cases to avoid data loss or data inconsistency(such as HBASE-14004) when > using original DFSOutputStream due to its complicated logic. And the > complicated logic also force us to use some magical tricks to increase > performance. For example, we need to use multiple threads to call {{hflush}} > when logging, and now we use 5 threads. But why 5 not 10 or 100? > So here, I propose we should implement our own {{DFSOutputStream}} when > logging WAL. For correctness, and also for performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14749) Make changes to region_mover.rb to use RegionMover Java tool
[ https://issues.apache.org/jira/browse/HBASE-14749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037491#comment-15037491 ] Hudson commented on HBASE-14749: FAILURE: Integrated in HBase-Trunk_matrix #526 (See [https://builds.apache.org/job/HBase-Trunk_matrix/526/]) HBASE-14749 Make changes to region_mover.rb to use RegionMover Java tool (stack: rev 91945d7f490cbb855a1d737d1979bf9931b0f2bd) * bin/rolling-restart.sh * bin/thread-pool.rb * hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionMover.java * bin/graceful_stop.sh * bin/region_mover.rb > Make changes to region_mover.rb to use RegionMover Java tool > > > Key: HBASE-14749 > URL: https://issues.apache.org/jira/browse/HBASE-14749 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan > Fix For: 2.0.0 > > Attachments: HBASE-14749-v2.patch, HBASE-14749-v3.patch, > HBASE-14749-v3.patch, HBASE-14749-v4.patch, HBASE-14749-v5.patch, > HBASE-14749.patch, HBASE-14749.patch > > > With HBASE-13014 in, we can now replace the ruby script such that it invokes > the Java Tool. Also expose timeout and no-ack mode which were added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14772) Improve zombie detector; be more discerning
[ https://issues.apache.org/jira/browse/HBASE-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037492#comment-15037492 ] Hudson commented on HBASE-14772: FAILURE: Integrated in HBase-Trunk_matrix #526 (See [https://builds.apache.org/job/HBase-Trunk_matrix/526/]) HBASE-14772 Improve zombie detector; be more discerning; part2 (stack: rev cf8d3bd641ef9f69dabecec1b9e87272493fe825) * dev-support/zombie-detector.sh * dev-support/test-patch.sh > Improve zombie detector; be more discerning > --- > > Key: HBASE-14772 > URL: https://issues.apache.org/jira/browse/HBASE-14772 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Attachments: 14772v3.patch, zombie.patch, zombiev2.patch > > > Currently, any surefire process with the hbase flag is a potential zombie. > Our zombie check currently takes a reading and if it finds candidate zombies, > it waits 30 seconds and then does another reading. If a concurrent build > going on, in both cases the zombie detector will come up positive though the > adjacent test run may be making progress; i.e. the cast of surefire processes > may have changed between readings but our detector just sees presence of > hbase surefire processes. > Here is example: > {code} > Suspicious java process found - waiting 30s to see if there are just slow to > stop > There appear to be 5 zombie tests, they should have been killed by surefire > but survived > 12823 surefirebooter852180186418035480.jar -enableassertions -Dhbase.test > -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom > -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true > 7653 surefirebooter8579074445899448699.jar -enableassertions -Dhbase.test > -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom > -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true > 12614 surefirebooter136529596936417090.jar -enableassertions -Dhbase.test > -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom > -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true > 7836 surefirebooter3217047564606450448.jar -enableassertions -Dhbase.test > -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom > -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true > 13566 surefirebooter2084039411151963494.jar -enableassertions -Dhbase.test > -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom > -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true > BEGIN zombies jstack extract > END zombies jstack extract > {code} > 5 is the number of forked processes we allow when doing medium and large > tests so an adjacent build will always show as '5 zombies'. > Need to add discerning if list of processes changes between readings. > Can I also add a tag per build run that all forked processes pick up so I can > look at the current builds progeny only? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037519#comment-15037519 ] Hadoop QA commented on HBASE-14906: --- {color:green}+1 overall{color}. {color:green}+1 core zombie tests -- no zombies!{color}. Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16749//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16749//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16749//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16749//console This message is automatically generated. > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037561#comment-15037561 ] Yu Li commented on HBASE-14906: --- >From the HadoopQA report, observe below failures (errors) although it says +1: {noformat} Tests in error: org.apache.hadoop.hbase.regionserver.TestBulkLoad.bulkHLogShouldThrowErrorWhenFamilySpecifiedAndHFileExistsButNotInTableDescriptor(org.apache.hadoop.hbase.regionserver.TestBulkLoad) Run 1: TestBulkLoad.bulkHLogShouldThrowErrorWhenFamilySpecifiedAndHFileExistsButNotInTableDescriptor � Run 2: TestBulkLoad.bulkHLogShouldThrowErrorWhenFamilySpecifiedAndHFileExistsButNotInTableDescriptor � Run 3: TestBulkLoad.bulkHLogShouldThrowErrorWhenFamilySpecifiedAndHFileExistsButNotInTableDescriptor � org.apache.hadoop.hbase.regionserver.TestBulkLoad.shouldThrowErrorIfBadFamilySpecifiedAsFamilyPath(org.apache.hadoop.hbase.regionserver.TestBulkLoad) Run 1: TestBulkLoad.shouldThrowErrorIfBadFamilySpecifiedAsFamilyPath � Unexpected ex... Run 2: TestBulkLoad.shouldThrowErrorIfBadFamilySpecifiedAsFamilyPath � Unexpected ex... Run 3: TestBulkLoad.shouldThrowErrorIfBadFamilySpecifiedAsFamilyPath � Unexpected ex... {noformat} And detailed exception: {noformat} shouldThrowErrorIfBadFamilySpecifiedAsFamilyPath(org.apache.hadoop.hbase.regionserver.TestBulkLoad) Time elapsed: 0.043 sec <<< ERROR! java.lang.Exception: Unexpected exception, expected but was at org.apache.hadoop.hbase.regionserver.FlushLargeStoresPolicy.configureForRegion(FlushLargeStoresPolicy.java:59) at org.apache.hadoop.hbase.regionserver.FlushPolicyFactory.create(FlushPolicyFactory.java:52) at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:845) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:786) at org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:6195) at org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:6204) at org.apache.hadoop.hbase.regionserver.TestBulkLoad.testRegionWithFamiliesAndSpecifiedTableName(TestBulkLoad.java:239) at org.apache.hadoop.hbase.regionserver.TestBulkLoad.testRegionWithFamilies(TestBulkLoad.java:249) at org.apache.hadoop.hbase.regionserver.TestBulkLoad.shouldThrowErrorIfBadFamilySpecifiedAsFamilyPath(TestBulkLoad.java:207) {noformat} Since these are Errors not Failures, the test stop at the middle phase. The issue is caused by the patch here since it doesn't handle the case that column family number is zero, although this won't happen in real world, it's possible in our unit test case like TestBulkload. > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14701) Fix flakey Failed tests: TestMobFlushSnapshotFromClient>TestFlushSnapshotFromClient.testSkipFlushTableSnapshot:199 null
[ https://issues.apache.org/jira/browse/HBASE-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037570#comment-15037570 ] Jingcheng Du commented on HBASE-14701: -- The cause of this issue is found. testSkipFlushTableSnapshot tries to check if the snapshot family (which must have store files) contains a provided family. It will be wrong when there are no store files under the snapshot family. In this case, if the memstore flushing is slower than the start of the admin.snapshot, the issue comes up. We need to make sure the flush is finished before snapshot starts. I will provide a patch to fix this. > Fix flakey Failed tests: > TestMobFlushSnapshotFromClient>TestFlushSnapshotFromClient.testSkipFlushTableSnapshot:199 > null > -- > > Key: HBASE-14701 > URL: https://issues.apache.org/jira/browse/HBASE-14701 > Project: HBase > Issue Type: Bug > Components: test >Reporter: stack >Assignee: Jingcheng Du > Attachments: disable.txt > > > This test has failed twice in last 24 hours. I removed it from master for now > over in HBASE-14678. It fails a lot. See here: > https://builds.apache.org/job/HBase-TRUNK/6962/testReport/history/ It > recently got refactored to remove a bunch of duplicated code. Assigning to > [~jingcheng...@intel.com] to take a look if you have a chance please. > Otherwise, unassign. Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14701) Fix flakey Failed tests: TestMobFlushSnapshotFromClient>TestFlushSnapshotFromClient.testSkipFlushTableSnapshot:199 null
[ https://issues.apache.org/jira/browse/HBASE-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingcheng Du updated HBASE-14701: - Status: Patch Available (was: Open) > Fix flakey Failed tests: > TestMobFlushSnapshotFromClient>TestFlushSnapshotFromClient.testSkipFlushTableSnapshot:199 > null > -- > > Key: HBASE-14701 > URL: https://issues.apache.org/jira/browse/HBASE-14701 > Project: HBase > Issue Type: Bug > Components: test >Reporter: stack >Assignee: Jingcheng Du > Attachments: HBASE-14701.patch, disable.txt > > > This test has failed twice in last 24 hours. I removed it from master for now > over in HBASE-14678. It fails a lot. See here: > https://builds.apache.org/job/HBase-TRUNK/6962/testReport/history/ It > recently got refactored to remove a bunch of duplicated code. Assigning to > [~jingcheng...@intel.com] to take a look if you have a chance please. > Otherwise, unassign. Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14922) Delayed flush doesn't work causing flush storms.
Elliott Clark created HBASE-14922: - Summary: Delayed flush doesn't work causing flush storms. Key: HBASE-14922 URL: https://issues.apache.org/jira/browse/HBASE-14922 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark Starting all regionservers at the same time will mean that most PeriodicMemstoreFlusher's will be running at the same time. So all of these threads will queue flushes at about the same time. This was supposed to be mitigated by Delayed. However that isn't used at all. This results in the immediate filling up and then draining of the flush queues every hour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only
[ https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037630#comment-15037630 ] Yu Li commented on HBASE-14790: --- Agree that we may not fix HBASE-14004 by simply implementing a new DFSOutputStream, but I think the FanoutOutputStream is still useful to reduce WAL sync latency. Pipeline is good for throughput but not for latency. > Implement a new DFSOutputStream for logging WAL only > > > Key: HBASE-14790 > URL: https://issues.apache.org/jira/browse/HBASE-14790 > Project: HBase > Issue Type: Improvement >Reporter: Duo Zhang > > The original {{DFSOutputStream}} is very powerful and aims to serve all > purposes. But in fact, we do not need most of the features if we only want to > log WAL. For example, we do not need pipeline recovery since we could just > close the old logger and open a new one. And also, we do not need to write > multiple blocks since we could also open a new logger if the old file is too > large. > And the most important thing is that, it is hard to handle all the corner > cases to avoid data loss or data inconsistency(such as HBASE-14004) when > using original DFSOutputStream due to its complicated logic. And the > complicated logic also force us to use some magical tricks to increase > performance. For example, we need to use multiple threads to call {{hflush}} > when logging, and now we use 5 threads. But why 5 not 10 or 100? > So here, I propose we should implement our own {{DFSOutputStream}} when > logging WAL. For correctness, and also for performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14701) Fix flakey Failed tests: TestMobFlushSnapshotFromClient>TestFlushSnapshotFromClient.testSkipFlushTableSnapshot:199 null
[ https://issues.apache.org/jira/browse/HBASE-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingcheng Du updated HBASE-14701: - Attachment: HBASE-14701.patch Upload the patch that adds TestMobFlushSnapshotFromClient and TestFlushSnapshotFromClient.testSkipFlushTableSnapshot back. > Fix flakey Failed tests: > TestMobFlushSnapshotFromClient>TestFlushSnapshotFromClient.testSkipFlushTableSnapshot:199 > null > -- > > Key: HBASE-14701 > URL: https://issues.apache.org/jira/browse/HBASE-14701 > Project: HBase > Issue Type: Bug > Components: test >Reporter: stack >Assignee: Jingcheng Du > Attachments: HBASE-14701.patch, disable.txt > > > This test has failed twice in last 24 hours. I removed it from master for now > over in HBASE-14678. It fails a lot. See here: > https://builds.apache.org/job/HBase-TRUNK/6962/testReport/history/ It > recently got refactored to remove a bunch of duplicated code. Assigning to > [~jingcheng...@intel.com] to take a look if you have a chance please. > Otherwise, unassign. Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14904) Mark Base[En|De]coder LimitedPrivate and fix binary compat issue
[ https://issues.apache.org/jira/browse/HBASE-14904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039724#comment-15039724 ] Hudson commented on HBASE-14904: SUCCESS: Integrated in HBase-1.2-IT #325 (See [https://builds.apache.org/job/HBase-1.2-IT/325/]) HBASE-14904 Mark Base[En|De]coder LimitedPrivate and fix binary compat (enis: rev a75a93f98ca003a172cc966464308b013b1769e4) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java * hbase-common/src/main/java/org/apache/hadoop/hbase/codec/BaseDecoder.java * hbase-common/src/main/java/org/apache/hadoop/hbase/codec/BaseEncoder.java * hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALPrettyPrinter.java > Mark Base[En|De]coder LimitedPrivate and fix binary compat issue > > > Key: HBASE-14904 > URL: https://issues.apache.org/jira/browse/HBASE-14904 > Project: HBase > Issue Type: Bug >Reporter: Enis Soztutar >Assignee: Enis Soztutar > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: hbase-14904_v1.patch, hbase-14904_v2.patch > > > PHOENIX-2477 revealed that the changes from HBASE-14501 breaks binary > compatibility in Phoenix compiled with earlier versions of HBase and run > agains later versions. > This is one of the areas that the boundary is not clear, but it won't hurt us > to fix it. > The exception trace is: > {code} > Exception in thread "main" java.lang.NoSuchFieldError: in > at > org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec$PhoenixBaseDecoder.(IndexedWALEditCodec.java:106) > at > org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec$IndexKeyValueDecoder.(IndexedWALEditCodec.java:121) > at > org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec.getDecoder(IndexedWALEditCodec.java:63) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initAfterCompression(ProtobufLogReader.java:292) > at > org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:82) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:148) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:316) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:281) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:269) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:418) > at > org.apache.hadoop.hbase.wal.WALPrettyPrinter.processFile(WALPrettyPrinter.java:247) > at > org.apache.hadoop.hbase.wal.WALPrettyPrinter.run(WALPrettyPrinter.java:422) > at > org.apache.hadoop.hbase.wal.WALPrettyPrinter.main(WALPrettyPrinter.java:357) > {code} > Although {{BaseDecoder.in}} is still there, it got changed to be a class > rather than an interface. BaseDecoder is marked Private, thus the binary > compat check is not run at all. Not sure whether it would have caught this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14905) VerifyReplication does not honour versions option
[ https://issues.apache.org/jira/browse/HBASE-14905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039723#comment-15039723 ] Hudson commented on HBASE-14905: SUCCESS: Integrated in HBase-1.2-IT #325 (See [https://builds.apache.org/job/HBase-1.2-IT/325/]) HBASE-14905 VerifyReplication does not honour versions option (Vishal (tedyu: rev eb777ef289827ae385c0cae71ea64cd6618e14af) * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java * hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSmallTests.java * hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationBase.java > VerifyReplication does not honour versions option > - > > Key: HBASE-14905 > URL: https://issues.apache.org/jira/browse/HBASE-14905 > Project: HBase > Issue Type: Bug > Components: tooling >Affects Versions: 2.0.0, 0.98.16 >Reporter: Vishal Khandelwal >Assignee: Vishal Khandelwal > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14905-v2.txt, HBASE-14905.patch, HBASE-14905_v3.patch, > HBASE-14905_v4.patch, test.patch > > > source: > hbase(main):001:0> scan 't1', {RAW => true, VERSIONS => 100} > ROW COLUMN+CELL > > > r1 column=f1:, timestamp=1449030102091, > value=value1112 > > r1 column=f1:, timestamp=1449029774173, > value=value1001 > > r1 column=f1:, timestamp=1449029709974, > value=value1002 > > target: > hbase(main):023:0> scan 't1', {RAW => true, VERSIONS => 100} > ROW COLUMN+CELL > > > r1 column=f1:, timestamp=1449030102091, > value=value1112 > > r1 column=f1:, timestamp=1449030090758, > value=value1112 > > r1 column=f1:, timestamp=1449029984282, > value=value > > r1 column=f1:, timestamp=1449029774173, > value=value1001 > > r1 column=f1:, timestamp=1449029709974, > value=value1002 > /bin/hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication > --versions=100 1 t1 > org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier$Counters > GOODROWS=1 > Does not show any mismatch. Ideally it should show. This is because in > VerifyReplication Class maxVersion is not correctly set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only
[ https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039770#comment-15039770 ] Duo Zhang commented on HBASE-14790: --- https://issues.apache.org/jira/secure/attachment/12445209/appendDesign3.pdf The design doc of hflush and happend already said that if all datanodes restart, then no guarantee could be provided. I think this is reasonable. Even if hflush succeeded, we could kill all the datanode in pipeline and the client and restart, the file after recovering lease could be shorter than the acked length. There will always be some situation that we could not know there is data loss unless we call fsync every time to update length on namenode when writing WAL I think. :( > Implement a new DFSOutputStream for logging WAL only > > > Key: HBASE-14790 > URL: https://issues.apache.org/jira/browse/HBASE-14790 > Project: HBase > Issue Type: Improvement >Reporter: Duo Zhang > > The original {{DFSOutputStream}} is very powerful and aims to serve all > purposes. But in fact, we do not need most of the features if we only want to > log WAL. For example, we do not need pipeline recovery since we could just > close the old logger and open a new one. And also, we do not need to write > multiple blocks since we could also open a new logger if the old file is too > large. > And the most important thing is that, it is hard to handle all the corner > cases to avoid data loss or data inconsistency(such as HBASE-14004) when > using original DFSOutputStream due to its complicated logic. And the > complicated logic also force us to use some magical tricks to increase > performance. For example, we need to use multiple threads to call {{hflush}} > when logging, and now we use 5 threads. But why 5 not 10 or 100? > So here, I propose we should implement our own {{DFSOutputStream}} when > logging WAL. For correctness, and also for performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only
[ https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15041147#comment-15041147 ] Duo Zhang commented on HBASE-14790: --- And I found that, {{hsync}} and {{hflush}} have different ack flows. {{hsync}} only sends ack back when the data is successfully synced to local disk, so I think use {{hsync}} is enough to detect if there is data loss(forget the {{fsync}}). > Implement a new DFSOutputStream for logging WAL only > > > Key: HBASE-14790 > URL: https://issues.apache.org/jira/browse/HBASE-14790 > Project: HBase > Issue Type: Improvement >Reporter: Duo Zhang > > The original {{DFSOutputStream}} is very powerful and aims to serve all > purposes. But in fact, we do not need most of the features if we only want to > log WAL. For example, we do not need pipeline recovery since we could just > close the old logger and open a new one. And also, we do not need to write > multiple blocks since we could also open a new logger if the old file is too > large. > And the most important thing is that, it is hard to handle all the corner > cases to avoid data loss or data inconsistency(such as HBASE-14004) when > using original DFSOutputStream due to its complicated logic. And the > complicated logic also force us to use some magical tricks to increase > performance. For example, we need to use multiple threads to call {{hflush}} > when logging, and now we use 5 threads. But why 5 not 10 or 100? > So here, I propose we should implement our own {{DFSOutputStream}} when > logging WAL. For correctness, and also for performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14919) Infrastructure refactoring
[ https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-14919: -- Attachment: HBASE-14919-V01.patch patch attached > Infrastructure refactoring > -- > > Key: HBASE-14919 > URL: https://issues.apache.org/jira/browse/HBASE-14919 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-14919-V01.patch > > > Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as > first-class citizen and decoupling memstore scanner from the memstore > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14919) Infrastructure refactoring
[ https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-14919: -- Status: Patch Available (was: Open) > Infrastructure refactoring > -- > > Key: HBASE-14919 > URL: https://issues.apache.org/jira/browse/HBASE-14919 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-14919-V01.patch > > > Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as > first-class citizen and decoupling memstore scanner from the memstore > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14701) Fix flakey Failed tests: TestMobFlushSnapshotFromClient>TestFlushSnapshotFromClient.testSkipFlushTableSnapshot:199 null
[ https://issues.apache.org/jira/browse/HBASE-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037863#comment-15037863 ] Hadoop QA commented on HBASE-14701: --- {color:green}+1 overall{color}. {color:green}+1 core zombie tests -- no zombies!{color}. Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16751//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16751//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16751//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16751//console This message is automatically generated. > Fix flakey Failed tests: > TestMobFlushSnapshotFromClient>TestFlushSnapshotFromClient.testSkipFlushTableSnapshot:199 > null > -- > > Key: HBASE-14701 > URL: https://issues.apache.org/jira/browse/HBASE-14701 > Project: HBase > Issue Type: Bug > Components: test >Reporter: stack >Assignee: Jingcheng Du > Attachments: HBASE-14701.patch, disable.txt > > > This test has failed twice in last 24 hours. I removed it from master for now > over in HBASE-14678. It fails a lot. See here: > https://builds.apache.org/job/HBase-TRUNK/6962/testReport/history/ It > recently got refactored to remove a bunch of duplicated code. Assigning to > [~jingcheng...@intel.com] to take a look if you have a chance please. > Otherwise, unassign. Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14905) VerifyReplication does not honour versions option
[ https://issues.apache.org/jira/browse/HBASE-14905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037784#comment-15037784 ] Heng Chen commented on HBASE-14905: --- please fix Indent errors in patch v4. > VerifyReplication does not honour versions option > - > > Key: HBASE-14905 > URL: https://issues.apache.org/jira/browse/HBASE-14905 > Project: HBase > Issue Type: Bug > Components: tooling >Affects Versions: 2.0.0, 0.98.16 >Reporter: Vishal Khandelwal >Assignee: Vishal Khandelwal > Fix For: 2.0.0 > > Attachments: 14905-v2.txt, HBASE-14905.patch, HBASE-14905_v3.patch, > HBASE-14905_v4.patch, test.patch > > > source: > hbase(main):001:0> scan 't1', {RAW => true, VERSIONS => 100} > ROW COLUMN+CELL > > > r1 column=f1:, timestamp=1449030102091, > value=value1112 > > r1 column=f1:, timestamp=1449029774173, > value=value1001 > > r1 column=f1:, timestamp=1449029709974, > value=value1002 > > target: > hbase(main):023:0> scan 't1', {RAW => true, VERSIONS => 100} > ROW COLUMN+CELL > > > r1 column=f1:, timestamp=1449030102091, > value=value1112 > > r1 column=f1:, timestamp=1449030090758, > value=value1112 > > r1 column=f1:, timestamp=1449029984282, > value=value > > r1 column=f1:, timestamp=1449029774173, > value=value1001 > > r1 column=f1:, timestamp=1449029709974, > value=value1002 > /bin/hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication > --versions=100 1 t1 > org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier$Counters > GOODROWS=1 > Does not show any mismatch. Ideally it should show. This is because in > VerifyReplication Class maxVersion is not correctly set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037831#comment-15037831 ] Eshcar Hillel commented on HBASE-13408: --- This is an umbrella Jira continuing the current issue. > HBase In-Memory Memstore Compaction > --- > > Key: HBASE-13408 > URL: https://issues.apache.org/jira/browse/HBASE-13408 > Project: HBase > Issue Type: New Feature >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-13408-trunk-v01.patch, > HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, > HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, > HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, > HBASE-13408-trunk-v08.patch, HBASE-13408-trunk-v09.patch, > HBASE-13408-trunk-v10.patch, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver04.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, > InMemoryMemstoreCompactionEvaluationResults.pdf, > InMemoryMemstoreCompactionMasterEvaluationResults.pdf, > InMemoryMemstoreCompactionScansEvaluationResults.pdf, > StoreSegmentandStoreSegmentScannerClassHierarchies.pdf > > > A store unit holds a column family in a region, where the memstore is its > in-memory component. The memstore absorbs all updates to the store; from time > to time these updates are flushed to a file on disk, where they are > compacted. Unlike disk components, the memstore is not compacted until it is > written to the filesystem and optionally to block-cache. This may result in > underutilization of the memory due to duplicate entries per row, for example, > when hot data is continuously updated. > Generally, the faster the data is accumulated in memory, more flushes are > triggered, the data sinks to disk more frequently, slowing down retrieval of > data, even if very recent. > In high-churn workloads, compacting the memstore can help maintain the data > in memory, and thereby speed up data retrieval. > We suggest a new compacted memstore with the following principles: > 1.The data is kept in memory for as long as possible > 2.Memstore data is either compacted or in process of being compacted > 3.Allow a panic mode, which may interrupt an in-progress compaction and > force a flush of part of the memstore. > We suggest applying this optimization only to in-memory column families. > A design document is attached. > This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HBASE-14906: -- Attachment: HBASE-14906.v4.patch Another patch to resolve regression UT failure caused by not updating property name in TestPerColumnFamilyFlush after renaming global config {{hbase.hregion.percolumnfamilyflush.size.lower.bound}} > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, > HBASE-14906.v3.patch, HBASE-14906.v4.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
[ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037817#comment-15037817 ] Eshcar Hillel commented on HBASE-14918: --- Submitted patch for first sub-task > In-Memory MemStore Flush and Compaction > --- > > Key: HBASE-14918 > URL: https://issues.apache.org/jira/browse/HBASE-14918 > Project: HBase > Issue Type: Umbrella >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel > > A memstore serves as the in-memory component of a store unit, absorbing all > updates to the store. From time to time these updates are flushed to a file > on disk, where they are compacted (by eliminating redundancies) and > compressed (i.e., written in a compressed format to reduce their storage > size). > We aim to speed up data access, and therefore suggest to apply in-memory > memstore flush. That is to flush the active in-memory segment into an > intermediate buffer where it can be accessed by the application. Data in the > buffer is subject to compaction and can be stored in any format that allows > it to take up smaller space in RAM. The less space the buffer consumes the > longer it can reside in memory before data is flushed to disk, resulting in > better performance. > Specifically, the optimization is beneficial for workloads with > medium-to-high key churn which incur many redundant cells, like persistent > messaging. > We suggest to structure the solution as 3 subtasks (respectively, patches). > (1) Infrastructure - refactoring of the MemStore hierarchy, introducing > segment (StoreSegment) as first-class citizen, and decoupling memstore > scanner from the memstore implementation; > (2) Implementation of a new memstore (CompactingMemstore) with non-optimized > immutable segment representation, and > (3) Memory optimization including compressed format representation and > offheap allocations. > This Jira continues the discussion in HBASE-13408. > Design documents, evaluation results and previous patches can be found in > HBASE-13408. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037825#comment-15037825 ] Eshcar Hillel commented on HBASE-13408: --- Created new Jira HBASE-14918 with three sub-tasks. Submitted patch for first refactoring task. This Jira is EOL; if you wish to continue followING this issue please start watching HBASE-14918 (and/or HBASE-14919/HBASE-14920/HBASE-14921). > HBase In-Memory Memstore Compaction > --- > > Key: HBASE-13408 > URL: https://issues.apache.org/jira/browse/HBASE-13408 > Project: HBase > Issue Type: New Feature >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-13408-trunk-v01.patch, > HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, > HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, > HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, > HBASE-13408-trunk-v08.patch, HBASE-13408-trunk-v09.patch, > HBASE-13408-trunk-v10.patch, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver04.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, > InMemoryMemstoreCompactionEvaluationResults.pdf, > InMemoryMemstoreCompactionMasterEvaluationResults.pdf, > InMemoryMemstoreCompactionScansEvaluationResults.pdf, > StoreSegmentandStoreSegmentScannerClassHierarchies.pdf > > > A store unit holds a column family in a region, where the memstore is its > in-memory component. The memstore absorbs all updates to the store; from time > to time these updates are flushed to a file on disk, where they are > compacted. Unlike disk components, the memstore is not compacted until it is > written to the filesystem and optionally to block-cache. This may result in > underutilization of the memory due to duplicate entries per row, for example, > when hot data is continuously updated. > Generally, the faster the data is accumulated in memory, more flushes are > triggered, the data sinks to disk more frequently, slowing down retrieval of > data, even if very recent. > In high-churn workloads, compacting the memstore can help maintain the data > in memory, and thereby speed up data retrieval. > We suggest a new compacted memstore with the following principles: > 1.The data is kept in memory for as long as possible > 2.Memstore data is either compacted or in process of being compacted > 3.Allow a panic mode, which may interrupt an in-progress compaction and > force a flush of part of the memstore. > We suggest applying this optimization only to in-memory column families. > A design document is attached. > This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14924) Slow response from HBASE REStful interface
Moulay Amine Jaidi created HBASE-14924: -- Summary: Slow response from HBASE REStful interface Key: HBASE-14924 URL: https://issues.apache.org/jira/browse/HBASE-14924 Project: HBase Issue Type: Brainstorming Components: REST Affects Versions: 1.1.1 Environment: IBM Biginsights 4.1 Reporter: Moulay Amine Jaidi Priority: Blocker We are currently experiencing an issue with HBase through the REST interface. Previously we were on version 0.96 and were ables to run the following REST command successfully and very quickly http://10.92.211.22:60800/tableName/RAWKEY.* At the moment after doing an upgrade to 1.1.1 this request takes a lot longer to retirive results (count is 12 items to return) Are there any configurations or known issues that may affect this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038063#comment-15038063 ] Hadoop QA commented on HBASE-14906: --- {color:green}+1 overall{color}. {color:green}+1 core zombie tests -- no zombies!{color}. Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16755//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16755//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16755//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16755//console This message is automatically generated. > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, > HBASE-14906.v3.patch, HBASE-14906.v4.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only
[ https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038167#comment-15038167 ] stack commented on HBASE-14790: --- bq. So I think every rs should keep an "acked length" of the current writing WAL file, an when doing replication HBase owning this fact is way to go. There is no way to get this info from current dfsclient, right? It would be a new metadata that this new work would reveal? > Implement a new DFSOutputStream for logging WAL only > > > Key: HBASE-14790 > URL: https://issues.apache.org/jira/browse/HBASE-14790 > Project: HBase > Issue Type: Improvement >Reporter: Duo Zhang > > The original {{DFSOutputStream}} is very powerful and aims to serve all > purposes. But in fact, we do not need most of the features if we only want to > log WAL. For example, we do not need pipeline recovery since we could just > close the old logger and open a new one. And also, we do not need to write > multiple blocks since we could also open a new logger if the old file is too > large. > And the most important thing is that, it is hard to handle all the corner > cases to avoid data loss or data inconsistency(such as HBASE-14004) when > using original DFSOutputStream due to its complicated logic. And the > complicated logic also force us to use some magical tricks to increase > performance. For example, we need to use multiple threads to call {{hflush}} > when logging, and now we use 5 threads. But why 5 not 10 or 100? > So here, I propose we should implement our own {{DFSOutputStream}} when > logging WAL. For correctness, and also for performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14906) Improvements on FlushLargeStoresPolicy
[ https://issues.apache.org/jira/browse/HBASE-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038139#comment-15038139 ] Yu Li commented on HBASE-14906: --- Confirmed no more UT failure from the testReport. However, the report still looks strange: summary about core tests, javadoc, etc. seems to disappear. [~stack] could you please take a look here sir? > Improvements on FlushLargeStoresPolicy > -- > > Key: HBASE-14906 > URL: https://issues.apache.org/jira/browse/HBASE-14906 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14906.patch, HBASE-14906.v2.patch, > HBASE-14906.v3.patch, HBASE-14906.v4.patch > > > When checking FlushLargeStoragePolicy, found below possible improving points: > 1. Currently in selectStoresToFlush, we will do the selection no matter how > many actual families, which is not necessary for one single family > 2. Default value for hbase.hregion.percolumnfamilyflush.size.lower.bound > could not fit in all cases, and requires user to know details of the > implementation to properly set it. We propose to use > "hbase.hregion.memstore.flush.size/column_family_number" instead: > {noformat} > > hbase.hregion.percolumnfamilyflush.size.lower.bound > 16777216 > > If FlushLargeStoresPolicy is used and there are multiple column families, > then every time that we hit the total memstore limit, we find out all the > column families whose memstores exceed a "lower bound" and only flush them > while retaining the others in memory. The "lower bound" will be > "hbase.hregion.memstore.flush.size / column_family_number" by default > unless value of this property is larger than that. If none of the families > have their memstore size more than lower bound, all the memstores will be > flushed (just as usual). > > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only
[ https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038178#comment-15038178 ] Zhe Zhang commented on HBASE-14790: --- [~stack] {{DataStreamer#block}} tracks the "number of bytes acked". It is returned by {{DFSOutputStream#getBlock}} [~Apache9] I'm still reading your analysis, will get back shortly > Implement a new DFSOutputStream for logging WAL only > > > Key: HBASE-14790 > URL: https://issues.apache.org/jira/browse/HBASE-14790 > Project: HBase > Issue Type: Improvement >Reporter: Duo Zhang > > The original {{DFSOutputStream}} is very powerful and aims to serve all > purposes. But in fact, we do not need most of the features if we only want to > log WAL. For example, we do not need pipeline recovery since we could just > close the old logger and open a new one. And also, we do not need to write > multiple blocks since we could also open a new logger if the old file is too > large. > And the most important thing is that, it is hard to handle all the corner > cases to avoid data loss or data inconsistency(such as HBASE-14004) when > using original DFSOutputStream due to its complicated logic. And the > complicated logic also force us to use some magical tricks to increase > performance. For example, we need to use multiple threads to call {{hflush}} > when logging, and now we use 5 threads. But why 5 not 10 or 100? > So here, I propose we should implement our own {{DFSOutputStream}} when > logging WAL. For correctness, and also for performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14919) Infrastructure refactoring
[ https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038145#comment-15038145 ] Hadoop QA commented on HBASE-14919: --- {color:red}-1 overall{color}. {color:green}+1 core zombie tests -- no zombies!{color}. Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16756//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16756//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16756//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16756//console This message is automatically generated. > Infrastructure refactoring > -- > > Key: HBASE-14919 > URL: https://issues.apache.org/jira/browse/HBASE-14919 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-14919-V01.patch > > > Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as > first-class citizen and decoupling memstore scanner from the memstore > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14701) Fix flakey Failed tests: TestMobFlushSnapshotFromClient>TestFlushSnapshotFromClient.testSkipFlushTableSnapshot:199 null
[ https://issues.apache.org/jira/browse/HBASE-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038172#comment-15038172 ] stack commented on HBASE-14701: --- Thanks [~jingcheng...@intel.com] I've broken hadoopqa for the moment hence the odd like report. Will retry your patch when stuff put together again. > Fix flakey Failed tests: > TestMobFlushSnapshotFromClient>TestFlushSnapshotFromClient.testSkipFlushTableSnapshot:199 > null > -- > > Key: HBASE-14701 > URL: https://issues.apache.org/jira/browse/HBASE-14701 > Project: HBase > Issue Type: Bug > Components: test >Reporter: stack >Assignee: Jingcheng Du > Attachments: HBASE-14701.patch, disable.txt > > > This test has failed twice in last 24 hours. I removed it from master for now > over in HBASE-14678. It fails a lot. See here: > https://builds.apache.org/job/HBase-TRUNK/6962/testReport/history/ It > recently got refactored to remove a bunch of duplicated code. Assigning to > [~jingcheng...@intel.com] to take a look if you have a chance please. > Otherwise, unassign. Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14923) VerifyReplication should not mask the exception during result comparison
[ https://issues.apache.org/jira/browse/HBASE-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037897#comment-15037897 ] Hadoop QA commented on HBASE-14923: --- {color:red}-1 overall{color}. {color:green}+1 core zombie tests -- no zombies!{color}. Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16753//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16753//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16753//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16753//console This message is automatically generated. > VerifyReplication should not mask the exception during result comparison > - > > Key: HBASE-14923 > URL: https://issues.apache.org/jira/browse/HBASE-14923 > Project: HBase > Issue Type: Bug > Components: tooling >Affects Versions: 2.0.0, 0.98.16 >Reporter: Vishal Khandelwal >Assignee: Vishal Khandelwal >Priority: Minor > Fix For: 2.0.0, 0.98.16 > > Attachments: HBASE-14923_v1.patch > > > hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java > Line:154 > } catch (Exception e) { > logFailRowAndIncreaseCounter(context, > Counters.CONTENT_DIFFERENT_ROWS, value); > } > Just LOG.error needs to be added for more information for the failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14869) Better request latency histograms
[ https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-14869: --- Fix Version/s: 0.98.17 1.3.0 2.0.0 > Better request latency histograms > - > > Key: HBASE-14869 > URL: https://issues.apache.org/jira/browse/HBASE-14869 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Assignee: Vikas Vishwakarma > Fix For: 2.0.0, 1.3.0, 0.98.17 > > Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, > 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, > 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png > > > I just discussed this with a colleague. > The get, put, etc, histograms that each region server keeps are somewhat > useless (depending on what you want to achieve of course), as they are > aggregated and calculated by each region server. > It would be better to record the number of requests in certainly latency > bands in addition to what we do now. > For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, > 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be > configurable). > That way we can do further calculations after the fact, and answer questions > like: How often did we miss our SLA? Percentage of requests that missed an > SLA, etc. > Comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14924) Slow response from HBASE REStful interface
[ https://issues.apache.org/jira/browse/HBASE-14924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038494#comment-15038494 ] Moulay Amine Jaidi commented on HBASE-14924: Thanks Andrew > Slow response from HBASE REStful interface > -- > > Key: HBASE-14924 > URL: https://issues.apache.org/jira/browse/HBASE-14924 > Project: HBase > Issue Type: Brainstorming > Components: REST >Affects Versions: 1.1.1 > Environment: IBM Biginsights 4.1 >Reporter: Moulay Amine Jaidi >Priority: Blocker > Labels: REST, hbase-rest, slow-scan > > We are currently experiencing an issue with HBase through the REST interface. > Previously we were on version 0.96 and were ables to run the following REST > command successfully and very quickly > http://10.92.211.22:60800/tableName/RAWKEY.* > At the moment after doing an upgrade to 1.1.1 this request takes a lot longer > to retirive results (count is 12 items to return) > Are there any configurations or known issues that may affect this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14907) NPE of MobUtils.hasMobColumns in Build failed in Jenkins: HBase-Trunk_matrix » latest1.8,Hadoop #513
[ https://issues.apache.org/jira/browse/HBASE-14907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038390#comment-15038390 ] Ted Yu commented on HBASE-14907: lgtm > NPE of MobUtils.hasMobColumns in Build failed in Jenkins: HBase-Trunk_matrix > » latest1.8,Hadoop #513 > > > Key: HBASE-14907 > URL: https://issues.apache.org/jira/browse/HBASE-14907 > Project: HBase > Issue Type: Bug > Components: mob >Reporter: Jingcheng Du >Assignee: Jingcheng Du > Attachments: HBASE-14907-V2.patch, HBASE-14907.patch > > > NPE is thrown when rollback the failures of table creation. > 1. Table is being created, get issues when creating fs layout. > 2. Rollback this creation and trying to delete the data from fs. It tries to > delete the mob dir and needs to ask HMaster about the HTableDescriptor, and > at that time the table dir had been deleted and no HTableDescriptor can be > found. > In this patch, it directly checks if the mob directory is existing instead of > checking the HTableDescriptor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14866) VerifyReplication should use peer configuration in peer connection
[ https://issues.apache.org/jira/browse/HBASE-14866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038429#comment-15038429 ] Gary Helmling commented on HBASE-14866: --- bq. Should transformClusterKey just use standardizeZKQuorumServerString rather than having two different cases? Sure, I can clean that up along the way. bq. Seems like the entry for shouldbemissing will have two prefixes. Is that intended? Thanks, nice catch! I'll fix that. bq. why not move buildZKQuorumServerString and standardizeZKQuorumServerString into ZKUtil? ZKUtil deals mostly with ZK related operations (adding, watching znodes). Building a quorum string is how we handle multiple ZK configs in a Configuration. So it seems more configuration related than ZK operation related to me. In addition ZKUtil is in hbase-client so we can't use anything in it from hbase-common. We could move everything, including createClusterConf(), to ZKUtil, but that seems a weird home for it to me, since the original problem was a failure to handle additional configuration properties (beyond ZK quorum config) for target/destination clusters. bq. Most of these changes to HBaseConfiguration seem to be very replication specific. Should we have a different class for replication based configuration, so that HBaseConfiguration doesn't get too unwieldy? bq. Agreed. Maybe we need something like ReplicationUtils ? These changes go beyond replication usage. This is a common problem wherever a program needs to handle talking to two clusters with a single Configuration. This applies to CopyTable, SyncTable and TableOutputFormat, none of which assumes any replication configuration. By comparison, the replication code actually abstracts the usage of these ZK configuration utilities pretty well, except for the couple of problems in ReplicationAdmin and VerifyReplication. In the non-replication cases, we need to be able to handle: applying different ZK quorum configurations for the different clusters, and overriding other configuration properties (for example security-related config) for the other clusters. {{HBaseConfiguration.createClusterConf()}} is the cleanest way I can see of abstracting this, especially for all of the non-replication usage. This also seems like clearly a configuration problem to me, so HBaseConfiguration seems like the right home. That is how we handle creating new HBase configurations everywhere (via {{HBaseConfiguration.create()}}), so this seems analogous. If we're worried about bloating HBaseConfiguration with the additions moved from ZKUtil, then I could create a new util class in hbase-common to hold them, but I think we already have a proliferation of config related methods spread across multiple utility classes: * ConfigurationUtils in hbase-server -- I would put the methods there, but we need access to them from hbase-client and hbase-server, so hbase-common seems like the right home. ConfigurationUtils is annotated public, so we can't just move it without compatibility concerns. * ZKUtil in hbase-client -- this class deals mostly with operations on ZooKeeper (adding, watching znodes), so I think removing all the config methods actually made for a cleaner separation of ZK operations vs. configuration related manipulations. Since ZKUtil is in hbase-client we also can't depend on it from hbase-common. We could move it to hbase-common, but that would introduce a new dependency on ZooKeeper in hbase-common that is not currently there. * ZKConfig in hbase-client -- this currently deals with creating a ZK properties based configuration for HQuorumPeer. So again moving the methods there would be expanding what it currently handles and has the additional problem of being in hbase-client, so createClusterConf() would have to move there as well. It seems to me like we have two best options: * move the ZK related config options to a new private util class in hbase-common. This could even be ZKConfig, moved from hbase-client, since it's private. It would be an expansion of it's current reponsibilities, but doesn't seem too bad. * go back to the original targeted fixes to ReplicationAdmin and VerifyReplication, since those are the actual problems I'm trying to solve. What do you guys think? I'll hold off on further changes to this until we get some consensus. > VerifyReplication should use peer configuration in peer connection > -- > > Key: HBASE-14866 > URL: https://issues.apache.org/jira/browse/HBASE-14866 > Project: HBase > Issue Type: Improvement > Components: Replication >Reporter: Gary Helmling >Assignee: Gary Helmling > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14866.patch, HBASE-14866_v1.patch, > hbase-14866-v4.patch,
[jira] [Commented] (HBASE-14900) Make global config option for ReplicationEndpoint
[ https://issues.apache.org/jira/browse/HBASE-14900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038343#comment-15038343 ] Andrew Purtell commented on HBASE-14900: Sounds good generically, do you have a patch we could look at? > Make global config option for ReplicationEndpoint > - > > Key: HBASE-14900 > URL: https://issues.apache.org/jira/browse/HBASE-14900 > Project: HBase > Issue Type: Sub-task > Components: Replication >Affects Versions: 2.0.0 >Reporter: Cody Marcel >Assignee: Cody Marcel >Priority: Minor > Fix For: 2.0.0 > > > Currently ReplicationEndpoint implementations can only be configured through > the HBase shell. We should be able to to use a property in the hbase-site.xml > to globally set an alternate default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-14822) Renewing leases of scanners doesn't work
[ https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038340#comment-15038340 ] Andrew Purtell edited comment on HBASE-14822 at 12/3/15 7:03 PM: - bq. There's something with region replicas and the specific meaning of requesting 0 rows. Ah, I should have tested more than the 0.98 patch. bq. Seems like I should regroup and just add another flag to the scan PB request. Seems so was (Author: apurtell): bq. There's something with region replicas and the specific meaning of requesting 0 rows. Should have tested more than the 0.98 patch. bq. Seems like I should regroup and just add another flag to the scan PB request. Seems so > Renewing leases of scanners doesn't work > > > Key: HBASE-14822 > URL: https://issues.apache.org/jira/browse/HBASE-14822 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.14 >Reporter: Samarth Jain >Assignee: Lars Hofhansl > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: 14822-0.98-v2.txt, 14822-0.98-v3.txt, 14822-0.98.txt, > 14822-v3-0.98.txt, 14822.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14924) Slow response from HBASE REStful interface
[ https://issues.apache.org/jira/browse/HBASE-14924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-14924. Resolution: Invalid This is the project development tracker. For user help and troubleshooting advice, please write to u...@hbase.apache.org. > Slow response from HBASE REStful interface > -- > > Key: HBASE-14924 > URL: https://issues.apache.org/jira/browse/HBASE-14924 > Project: HBase > Issue Type: Brainstorming > Components: REST >Affects Versions: 1.1.1 > Environment: IBM Biginsights 4.1 >Reporter: Moulay Amine Jaidi >Priority: Blocker > Labels: REST, hbase-rest, slow-scan > > We are currently experiencing an issue with HBase through the REST interface. > Previously we were on version 0.96 and were ables to run the following REST > command successfully and very quickly > http://10.92.211.22:60800/tableName/RAWKEY.* > At the moment after doing an upgrade to 1.1.1 this request takes a lot longer > to retirive results (count is 12 items to return) > Are there any configurations or known issues that may affect this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14907) NPE of MobUtils.hasMobColumns in Build failed in Jenkins: HBase-Trunk_matrix » latest1.8,Hadoop #513
[ https://issues.apache.org/jira/browse/HBASE-14907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038407#comment-15038407 ] Matteo Bertozzi commented on HBASE-14907: - +1 on v2 > NPE of MobUtils.hasMobColumns in Build failed in Jenkins: HBase-Trunk_matrix > » latest1.8,Hadoop #513 > > > Key: HBASE-14907 > URL: https://issues.apache.org/jira/browse/HBASE-14907 > Project: HBase > Issue Type: Bug > Components: mob >Reporter: Jingcheng Du >Assignee: Jingcheng Du > Attachments: HBASE-14907-V2.patch, HBASE-14907.patch > > > NPE is thrown when rollback the failures of table creation. > 1. Table is being created, get issues when creating fs layout. > 2. Rollback this creation and trying to delete the data from fs. It tries to > delete the mob dir and needs to ask HMaster about the HTableDescriptor, and > at that time the table dir had been deleted and no HTableDescriptor can be > found. > In this patch, it directly checks if the mob directory is existing instead of > checking the HTableDescriptor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14922) Delayed flush doesn't work causing flush storms.
[ https://issues.apache.org/jira/browse/HBASE-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-14922: -- Fix Version/s: 1.3.0 1.2.0 2.0.0 Affects Version/s: 1.2.0 2.0.0 1.1.2 Status: Patch Available (was: Open) > Delayed flush doesn't work causing flush storms. > > > Key: HBASE-14922 > URL: https://issues.apache.org/jira/browse/HBASE-14922 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 1.1.2, 2.0.0, 1.2.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14922-v1.patch, HBASE-14922.patch > > > Starting all regionservers at the same time will mean that most > PeriodicMemstoreFlusher's will be running at the same time. So all of these > threads will queue flushes at about the same time. > This was supposed to be mitigated by Delayed. However that isn't nearly > enough. This results in the immediate filling up and then draining of the > flush queues every hour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14922) Delayed flush doesn't work causing flush storms.
[ https://issues.apache.org/jira/browse/HBASE-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-14922: -- Component/s: regionserver > Delayed flush doesn't work causing flush storms. > > > Key: HBASE-14922 > URL: https://issues.apache.org/jira/browse/HBASE-14922 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0, 1.2.0, 1.1.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14922-v1.patch, HBASE-14922.patch > > > Starting all regionservers at the same time will mean that most > PeriodicMemstoreFlusher's will be running at the same time. So all of these > threads will queue flushes at about the same time. > This was supposed to be mitigated by Delayed. However that isn't nearly > enough. This results in the immediate filling up and then draining of the > flush queues every hour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14869) Better request latency histograms
[ https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038333#comment-15038333 ] Andrew Purtell commented on HBASE-14869: The latest patches look lgtm. Are we sure the new output is consumable and useful for the intended purpose [~lhofhansl] [~vik.karma] ? Maybe try this in a test environment (for our purposes, with Splunk)? > Better request latency histograms > - > > Key: HBASE-14869 > URL: https://issues.apache.org/jira/browse/HBASE-14869 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl >Assignee: Vikas Vishwakarma > Fix For: 2.0.0, 1.3.0, 0.98.17 > > Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, > 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, > 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png > > > I just discussed this with a colleague. > The get, put, etc, histograms that each region server keeps are somewhat > useless (depending on what you want to achieve of course), as they are > aggregated and calculated by each region server. > It would be better to record the number of requests in certainly latency > bands in addition to what we do now. > For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, > 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be > configurable). > That way we can do further calculations after the fact, and answer questions > like: How often did we miss our SLA? Percentage of requests that missed an > SLA, etc. > Comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14822) Renewing leases of scanners doesn't work
[ https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038340#comment-15038340 ] Andrew Purtell commented on HBASE-14822: bq. There's something with region replicas and the specific meaning of requesting 0 rows. Should have tested more than the 0.98 patch. bq. Seems like I should regroup and just add another flag to the scan PB request. Seems so > Renewing leases of scanners doesn't work > > > Key: HBASE-14822 > URL: https://issues.apache.org/jira/browse/HBASE-14822 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.14 >Reporter: Samarth Jain >Assignee: Lars Hofhansl > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: 14822-0.98-v2.txt, 14822-0.98-v3.txt, 14822-0.98.txt, > 14822-v3-0.98.txt, 14822.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14903) Table Or Region?
[ https://issues.apache.org/jira/browse/HBASE-14903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038402#comment-15038402 ] Andrew Purtell commented on HBASE-14903: bq. I think this sentence "When a table is in the process of splitting," should be "When a Region is in the process of splitting," on chapter 【62.2. hbase:meta】 Yes bq. By the way,is this document the latest?【http://hbase.apache.org/book.html#arch.overview】I will translate it! Yes, thanks! > Table Or Region? > > > Key: HBASE-14903 > URL: https://issues.apache.org/jira/browse/HBASE-14903 > Project: HBase > Issue Type: Bug > Components: documentation >Affects Versions: 2.0.0 >Reporter: 胡托 >Priority: Blocker > > I've been reading on Latest Reference Guide and try to translated into > Chinese! > I think this sentence "When a table is in the process of splitting," > should be "When a Region is in the process of splitting," on chapter 【62.2. > hbase:meta】。 > By the way,is this document the > latest?【http://hbase.apache.org/book.html#arch.overview】I will translate it! -- This message was sent by Atlassian JIRA (v6.3.4#6332)