[jira] [Created] (HBASE-5454) Refuse operations from Admin befor master is initialized
Refuse operations from Admin befor master is initialized Key: HBASE-5454 URL: https://issues.apache.org/jira/browse/HBASE-5454 Project: HBase Issue Type: Improvement Reporter: chunhui shen In our testing environment, When master is initializing, we found conflict problems between master#assignAllUserRegions and EnableTable event, causing assigning region throw exception so that master abort itself. We think we'd better refuse operations from Admin, such as CreateTable, EnableTable,etc, It could reduce error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213448#comment-13213448 ] Phabricator commented on HBASE-5074: dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 ideally, we need two different fs. The first fs is for writing and reading-with-hdfs-checksums. The other fs is for reading-without-hdfs. src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:129 done src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:49 The HFile layer is the one that is responsible for opening a file for reading. Then the multi-threaded HFileBlockLayer uses those FSDataInputStream to pread data from HDFS. So, I need to make the HFile layer open two file descriptors for the same file, both for reading purposes... one which checksum and the other without checksums src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:44 This is a protected member, so users of this class are not concerned on what this is. If you have a better structure on how to organize this one, please do let me know src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:84 The Checksum API returns a long. But actual implementations like CRC32, CRC32C, etc all return an int. Also, the Hadoop checksum implementation also uses a 4 byte value. If you think that we should store 8 byte checksums, I can do that. But for the common case, we will be wasting 4 bytes in the header for every checksum chunk src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:205 done REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5074: --- Attachment: D1521.7.patch dhruba updated the revision [jira] [HBASE-5074] Support checksums in HBase block cache. Reviewers: mbautin Incorporated Stacks's review comments. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/fs src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5074: --- Attachment: D1521.7.patch dhruba updated the revision [jira] [HBASE-5074] Support checksums in HBase block cache. Reviewers: mbautin Incorporated Stacks's review comments. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/fs src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213451#comment-13213451 ] Phabricator commented on HBASE-5074: dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 ideally, we need two different fs. The first fs is for writing and reading-with-hdfs-checksums. The other fs is for reading-without-hdfs. src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:129 done src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:49 The HFile layer is the one that is responsible for opening a file for reading. Then the multi-threaded HFileBlockLayer uses those FSDataInputStream to pread data from HDFS. So, I need to make the HFile layer open two file descriptors for the same file, both for reading purposes... one which checksum and the other without checksums src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:44 This is a protected member, so users of this class are not concerned on what this is. If you have a better structure on how to organize this one, please do let me know src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:84 The Checksum API returns a long. But actual implementations like CRC32, CRC32C, etc all return an int. Also, the Hadoop checksum implementation also uses a 4 byte value. If you think that we should store 8 byte checksums, I can do that. But for the common case, we will be wasting 4 bytes in the header for every checksum chunk src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java:205 done REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5454) Refuse operations from Admin befor master is initialized
[ https://issues.apache.org/jira/browse/HBASE-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-5454: Attachment: hbase-5454.patch Refuse operations from Admin befor master is initialized Key: HBASE-5454 URL: https://issues.apache.org/jira/browse/HBASE-5454 Project: HBase Issue Type: Improvement Reporter: chunhui shen Attachments: hbase-5454.patch In our testing environment, When master is initializing, we found conflict problems between master#assignAllUserRegions and EnableTable event, causing assigning region throw exception so that master abort itself. We think we'd better refuse operations from Admin, such as CreateTable, EnableTable,etc, It could reduce error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213468#comment-13213468 ] Max Lapan commented on HBASE-5416: -- The problem was a little bit tricky than I expected. The failed tests are caused by PageFilter and WhileMatchFilter expecting that filterRow method are called only once per non-empty row. Previous version of patch breaks this, so, tests are failed. I resolved this by checking that row is not empty right before filterRow(List) called, but this requires to slightly modify SingleColumnValueExcludeFilter logic - move exclude phase from filterKeyValue method to filterRow(List). The main reason for this is beacuse there is no way to distinguish at RegionScanner::nextInternal level empty row which is empty because of filter accepts row, but excludes all it's KVs and row which is empty due to filter rejects it. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: Filtered_scans.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-5416: - Status: Open (was: Patch Available) Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: Filtered_scans.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213470#comment-13213470 ] Hadoop QA commented on HBASE-5074: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515551/D1521.7.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 55 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -132 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 155 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.util.TestFSUtils org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1006//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1006//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1006//console This message is automatically generated. support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-5416: - Attachment: Filtered_scans_v2.patch Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: Filtered_scans.patch, Filtered_scans_v2.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-5416: - Status: Patch Available (was: Open) Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: Filtered_scans.patch, Filtered_scans_v2.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5422) StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins)
[ https://issues.apache.org/jira/browse/HBASE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-5422: Attachment: hbase-5422v3.patch Thanks for Ted's review. StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins) -- Key: HBASE-5422 URL: https://issues.apache.org/jira/browse/HBASE-5422 Project: HBase Issue Type: Bug Components: master Reporter: chunhui shen Attachments: 5422-90.patch, 5422-90v2.patch, hbase-5422.patch, hbase-5422v2.patch, hbase-5422v3.patch In our produce environment We find a lot of timeout on RIT when cluster up, there are about 7w regions in the cluster( 25 regionservers ). First, we could see the following log:(See the region 33cf229845b1009aa8a3f7b0f85c9bd0) master's log 2012-02-13 18:07:41,409 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Async create of unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 with OFFLINE state 2012-02-13 18:07:42,560 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409, server=r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:07:42,996 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329127662996 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 2012-02-13 18:11:16,744 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=r03f11025.yh.aliyun.com,60020,1329127549907, region=33cf229845b1009aa8a3f7b0f85c9bd0 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 33cf229845b1009aa8a3f7b0f85c9bd0; deleting unassigned node 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Deleting existing unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 that is in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Successfully deleted unassigned node for region 33cf229845b1009aa8a3f7b0f85c9bd0 in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,573 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. on r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. so generated a random one; hri=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0., src=, dest=r01b05043.yh.aliyun.com,60020,1329127549041; 29 (online=29, exclude=null) available servers 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. to r01b05043.yh.aliyun.com,60020,1329127549041 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329132528086 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. Regionserver's log 2012-02-13 18:07:43,537 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 2012-02-13 18:11:16,560 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open of item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. Through the RS's log, we could find it is larger than 3mins from receive openRegion request to start processing openRegion, causing timeout on RIT in master for the region.
[jira] [Updated] (HBASE-5422) StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins)
[ https://issues.apache.org/jira/browse/HBASE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-5422: Attachment: 5422-90v3.patch StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins) -- Key: HBASE-5422 URL: https://issues.apache.org/jira/browse/HBASE-5422 Project: HBase Issue Type: Bug Components: master Reporter: chunhui shen Attachments: 5422-90.patch, 5422-90v2.patch, 5422-90v3.patch, hbase-5422.patch, hbase-5422v2.patch, hbase-5422v3.patch In our produce environment We find a lot of timeout on RIT when cluster up, there are about 7w regions in the cluster( 25 regionservers ). First, we could see the following log:(See the region 33cf229845b1009aa8a3f7b0f85c9bd0) master's log 2012-02-13 18:07:41,409 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Async create of unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 with OFFLINE state 2012-02-13 18:07:42,560 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409, server=r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:07:42,996 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329127662996 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 2012-02-13 18:11:16,744 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=r03f11025.yh.aliyun.com,60020,1329127549907, region=33cf229845b1009aa8a3f7b0f85c9bd0 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 33cf229845b1009aa8a3f7b0f85c9bd0; deleting unassigned node 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Deleting existing unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 that is in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Successfully deleted unassigned node for region 33cf229845b1009aa8a3f7b0f85c9bd0 in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,573 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. on r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. so generated a random one; hri=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0., src=, dest=r01b05043.yh.aliyun.com,60020,1329127549041; 29 (online=29, exclude=null) available servers 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. to r01b05043.yh.aliyun.com,60020,1329127549041 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329132528086 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. Regionserver's log 2012-02-13 18:07:43,537 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 2012-02-13 18:11:16,560 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open of item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. Through the RS's log, we could find it is larger than 3mins from receive openRegion request to start processing openRegion, causing timeout on RIT in master for the region. Let's see the code of
[jira] [Commented] (HBASE-5422) StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins)
[ https://issues.apache.org/jira/browse/HBASE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213478#comment-13213478 ] chunhui shen commented on HBASE-5422: - Amend it in patch v3 StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins) -- Key: HBASE-5422 URL: https://issues.apache.org/jira/browse/HBASE-5422 Project: HBase Issue Type: Bug Components: master Reporter: chunhui shen Attachments: 5422-90.patch, 5422-90v2.patch, 5422-90v3.patch, hbase-5422.patch, hbase-5422v2.patch, hbase-5422v3.patch In our produce environment We find a lot of timeout on RIT when cluster up, there are about 7w regions in the cluster( 25 regionservers ). First, we could see the following log:(See the region 33cf229845b1009aa8a3f7b0f85c9bd0) master's log 2012-02-13 18:07:41,409 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Async create of unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 with OFFLINE state 2012-02-13 18:07:42,560 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409, server=r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:07:42,996 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329127662996 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 2012-02-13 18:11:16,744 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=r03f11025.yh.aliyun.com,60020,1329127549907, region=33cf229845b1009aa8a3f7b0f85c9bd0 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 33cf229845b1009aa8a3f7b0f85c9bd0; deleting unassigned node 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Deleting existing unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 that is in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Successfully deleted unassigned node for region 33cf229845b1009aa8a3f7b0f85c9bd0 in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,573 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. on r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. so generated a random one; hri=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0., src=, dest=r01b05043.yh.aliyun.com,60020,1329127549041; 29 (online=29, exclude=null) available servers 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. to r01b05043.yh.aliyun.com,60020,1329127549041 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329132528086 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. Regionserver's log 2012-02-13 18:07:43,537 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 2012-02-13 18:11:16,560 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open of item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. Through the RS's log, we could find it is larger than 3mins from receive openRegion request to start processing openRegion, causing timeout
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213479#comment-13213479 ] Hadoop QA commented on HBASE-5416: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515558/Filtered_scans_v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 151 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.filter.TestSingleColumnValueExcludeFilter Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1007//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1007//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1007//console This message is automatically generated. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: Filtered_scans.patch, Filtered_scans_v2.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-5416: - Status: Open (was: Patch Available) Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: Filtered_scans.patch, Filtered_scans_v2.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3484) Replace memstore's ConcurrentSkipListMap with our own implementation
[ https://issues.apache.org/jira/browse/HBASE-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HBASE-3484: --- Attachment: hierarchical-map.txt Here's something I hacked together tonight which maps the memstore maps hierarchical. It should save a bit of CPU especially when doing wide puts, but I haven't done any serious benchmarking. It probably has negative memory effects in its current incarnation. Seems to kind-of work. Replace memstore's ConcurrentSkipListMap with our own implementation Key: HBASE-3484 URL: https://issues.apache.org/jira/browse/HBASE-3484 Project: HBase Issue Type: Improvement Components: performance Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Attachments: hierarchical-map.txt By copy-pasting ConcurrentSkipListMap into HBase we can make two improvements to it for our use case in MemStore: - add an iterator.replace() method which should allow us to do upsert much more cheaply - implement a Set directly without having to do MapKeyValue,KeyValue to save one reference per entry It turns out CSLM is in public domain from its development as part of JSR 166, so we should be OK with licenses. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5441) HRegionThriftServer may not start because of a race-condition
[ https://issues.apache.org/jira/browse/HBASE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5441: -- Status: Patch Available (was: Open) HRegionThriftServer may not start because of a race-condition - Key: HBASE-5441 URL: https://issues.apache.org/jira/browse/HBASE-5441 Project: HBase Issue Type: Bug Components: thrift Reporter: Scott Chen Assignee: Scott Chen Priority: Minor Attachments: HBASE-5441.D1845.1.patch, HBASE-5441.D1845.2.patch, HBASE-5441.D1857.2.patch, HBASE-5441.D1857.3.patch This happens because the master is not started when ThriftServerRunner tries to create an HBaseAdmin. org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1333) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:899) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy8.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:649) at org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:108) at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.init(ThriftServerRunner.java:516) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer$HBaseHandlerRegion.init(HRegionThriftServer.java:104) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer.init(HRegionThriftServer.java:74) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeThreads(HRegionServer.java:646) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:546) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:658) at java.lang.Thread.run(Thread.java:662) 2012-02-21 16:38:18,223 INFO org.apache.hadoop.hba -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5422) StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins)
[ https://issues.apache.org/jira/browse/HBASE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5422: -- Status: Patch Available (was: Open) StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins) -- Key: HBASE-5422 URL: https://issues.apache.org/jira/browse/HBASE-5422 Project: HBase Issue Type: Bug Components: master Reporter: chunhui shen Attachments: 5422-90.patch, 5422-90v2.patch, 5422-90v3.patch, hbase-5422.patch, hbase-5422v2.patch, hbase-5422v3.patch In our produce environment We find a lot of timeout on RIT when cluster up, there are about 7w regions in the cluster( 25 regionservers ). First, we could see the following log:(See the region 33cf229845b1009aa8a3f7b0f85c9bd0) master's log 2012-02-13 18:07:41,409 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Async create of unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 with OFFLINE state 2012-02-13 18:07:42,560 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409, server=r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:07:42,996 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329127662996 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 2012-02-13 18:11:16,744 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=r03f11025.yh.aliyun.com,60020,1329127549907, region=33cf229845b1009aa8a3f7b0f85c9bd0 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 33cf229845b1009aa8a3f7b0f85c9bd0; deleting unassigned node 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Deleting existing unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 that is in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Successfully deleted unassigned node for region 33cf229845b1009aa8a3f7b0f85c9bd0 in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,573 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. on r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. so generated a random one; hri=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0., src=, dest=r01b05043.yh.aliyun.com,60020,1329127549041; 29 (online=29, exclude=null) available servers 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. to r01b05043.yh.aliyun.com,60020,1329127549041 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329132528086 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. Regionserver's log 2012-02-13 18:07:43,537 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 2012-02-13 18:11:16,560 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open of item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. Through the RS's log, we could find it is larger than 3mins from receive openRegion request to start processing openRegion, causing timeout on RIT in master for the region. Let's see the
[jira] [Commented] (HBASE-5422) StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins)
[ https://issues.apache.org/jira/browse/HBASE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213556#comment-13213556 ] Hadoop QA commented on HBASE-5422: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515563/5422-90v3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1009//console This message is automatically generated. StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins) -- Key: HBASE-5422 URL: https://issues.apache.org/jira/browse/HBASE-5422 Project: HBase Issue Type: Bug Components: master Reporter: chunhui shen Attachments: 5422-90.patch, 5422-90v2.patch, 5422-90v3.patch, hbase-5422.patch, hbase-5422v2.patch, hbase-5422v3.patch In our produce environment We find a lot of timeout on RIT when cluster up, there are about 7w regions in the cluster( 25 regionservers ). First, we could see the following log:(See the region 33cf229845b1009aa8a3f7b0f85c9bd0) master's log 2012-02-13 18:07:41,409 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Async create of unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 with OFFLINE state 2012-02-13 18:07:42,560 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409, server=r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:07:42,996 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329127662996 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 2012-02-13 18:11:16,744 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=r03f11025.yh.aliyun.com,60020,1329127549907, region=33cf229845b1009aa8a3f7b0f85c9bd0 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 33cf229845b1009aa8a3f7b0f85c9bd0; deleting unassigned node 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Deleting existing unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 that is in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Successfully deleted unassigned node for region 33cf229845b1009aa8a3f7b0f85c9bd0 in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,573 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. on r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. so generated a random one; hri=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0., src=, dest=r01b05043.yh.aliyun.com,60020,1329127549041; 29 (online=29, exclude=null) available servers 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. to r01b05043.yh.aliyun.com,60020,1329127549041 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329132528086 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has
[jira] [Commented] (HBASE-5441) HRegionThriftServer may not start because of a race-condition
[ https://issues.apache.org/jira/browse/HBASE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213554#comment-13213554 ] Hadoop QA commented on HBASE-5441: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515545/HBASE-5441.D1857.3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 152 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.thrift.TestThriftServer org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1008//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1008//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1008//console This message is automatically generated. HRegionThriftServer may not start because of a race-condition - Key: HBASE-5441 URL: https://issues.apache.org/jira/browse/HBASE-5441 Project: HBase Issue Type: Bug Components: thrift Reporter: Scott Chen Assignee: Scott Chen Priority: Minor Attachments: HBASE-5441.D1845.1.patch, HBASE-5441.D1845.2.patch, HBASE-5441.D1857.2.patch, HBASE-5441.D1857.3.patch This happens because the master is not started when ThriftServerRunner tries to create an HBaseAdmin. org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1333) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:899) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy8.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:649) at org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:108) at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.init(ThriftServerRunner.java:516) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer$HBaseHandlerRegion.init(HRegionThriftServer.java:104) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer.init(HRegionThriftServer.java:74) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeThreads(HRegionServer.java:646) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:546) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:658) at java.lang.Thread.run(Thread.java:662) 2012-02-21 16:38:18,223 INFO org.apache.hadoop.hba -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5422) StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins)
[ https://issues.apache.org/jira/browse/HBASE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5422: -- Attachment: 5422-v3.txt Chunhui's patch v3. StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins) -- Key: HBASE-5422 URL: https://issues.apache.org/jira/browse/HBASE-5422 Project: HBase Issue Type: Bug Components: master Reporter: chunhui shen Attachments: 5422-90.patch, 5422-90v2.patch, 5422-90v3.patch, 5422-v3.txt, hbase-5422.patch, hbase-5422v2.patch, hbase-5422v3.patch In our produce environment We find a lot of timeout on RIT when cluster up, there are about 7w regions in the cluster( 25 regionservers ). First, we could see the following log:(See the region 33cf229845b1009aa8a3f7b0f85c9bd0) master's log 2012-02-13 18:07:41,409 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Async create of unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 with OFFLINE state 2012-02-13 18:07:42,560 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409, server=r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:07:42,996 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329127662996 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 2012-02-13 18:11:16,744 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=r03f11025.yh.aliyun.com,60020,1329127549907, region=33cf229845b1009aa8a3f7b0f85c9bd0 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 33cf229845b1009aa8a3f7b0f85c9bd0; deleting unassigned node 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Deleting existing unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 that is in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Successfully deleted unassigned node for region 33cf229845b1009aa8a3f7b0f85c9bd0 in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,573 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. on r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. so generated a random one; hri=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0., src=, dest=r01b05043.yh.aliyun.com,60020,1329127549041; 29 (online=29, exclude=null) available servers 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. to r01b05043.yh.aliyun.com,60020,1329127549041 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329132528086 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. Regionserver's log 2012-02-13 18:07:43,537 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 2012-02-13 18:11:16,560 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open of item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. Through the RS's log, we could find it is larger than 3mins from receive openRegion request to start processing openRegion, causing timeout on RIT in master
[jira] [Updated] (HBASE-5422) StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins)
[ https://issues.apache.org/jira/browse/HBASE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5422: -- Comment: was deleted (was: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515563/5422-90v3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1009//console This message is automatically generated.) StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins) -- Key: HBASE-5422 URL: https://issues.apache.org/jira/browse/HBASE-5422 Project: HBase Issue Type: Bug Components: master Reporter: chunhui shen Attachments: 5422-90.patch, 5422-90v2.patch, 5422-90v3.patch, 5422-v3.txt, hbase-5422.patch, hbase-5422v2.patch, hbase-5422v3.patch In our produce environment We find a lot of timeout on RIT when cluster up, there are about 7w regions in the cluster( 25 regionservers ). First, we could see the following log:(See the region 33cf229845b1009aa8a3f7b0f85c9bd0) master's log 2012-02-13 18:07:41,409 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Async create of unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 with OFFLINE state 2012-02-13 18:07:42,560 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409, server=r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:07:42,996 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329127662996 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 2012-02-13 18:11:16,744 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=r03f11025.yh.aliyun.com,60020,1329127549907, region=33cf229845b1009aa8a3f7b0f85c9bd0 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 33cf229845b1009aa8a3f7b0f85c9bd0; deleting unassigned node 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Deleting existing unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 that is in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Successfully deleted unassigned node for region 33cf229845b1009aa8a3f7b0f85c9bd0 in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,573 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. on r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. so generated a random one; hri=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0., src=, dest=r01b05043.yh.aliyun.com,60020,1329127549041; 29 (online=29, exclude=null) available servers 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. to r01b05043.yh.aliyun.com,60020,1329127549041 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329132528086 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN
[jira] [Commented] (HBASE-5441) HRegionThriftServer may not start because of a race-condition
[ https://issues.apache.org/jira/browse/HBASE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213716#comment-13213716 ] Phabricator commented on HBASE-5441: tedyu has accepted the revision HBASE-5441 [jira] HRegionThriftServer may not start because of a race-condition. REVISION DETAIL https://reviews.facebook.net/D1857 BRANCH hb-5441 HRegionThriftServer may not start because of a race-condition - Key: HBASE-5441 URL: https://issues.apache.org/jira/browse/HBASE-5441 Project: HBase Issue Type: Bug Components: thrift Reporter: Scott Chen Assignee: Scott Chen Priority: Minor Attachments: HBASE-5441.D1845.1.patch, HBASE-5441.D1845.2.patch, HBASE-5441.D1857.2.patch, HBASE-5441.D1857.3.patch This happens because the master is not started when ThriftServerRunner tries to create an HBaseAdmin. org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1333) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:899) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy8.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:649) at org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:108) at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.init(ThriftServerRunner.java:516) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer$HBaseHandlerRegion.init(HRegionThriftServer.java:104) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer.init(HRegionThriftServer.java:74) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeThreads(HRegionServer.java:646) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:546) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:658) at java.lang.Thread.run(Thread.java:662) 2012-02-21 16:38:18,223 INFO org.apache.hadoop.hba -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5437) HRegionThriftServer does not start because of a bug in HbaseHandlerMetricsProxy
[ https://issues.apache.org/jira/browse/HBASE-5437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213714#comment-13213714 ] Zhihong Yu commented on HBASE-5437: --- @Scott: I think D1857 is a cross-post: I don't find Hbase.Iface.class in the patch. It is for HBASE-5441 HRegionThriftServer does not start because of a bug in HbaseHandlerMetricsProxy --- Key: HBASE-5437 URL: https://issues.apache.org/jira/browse/HBASE-5437 Project: HBase Issue Type: Bug Components: metrics, thrift Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.94.0 Attachments: HBASE-5437.D1857.1.patch 3.facebook.com,60020,1329865516120: Initialization of RS failed. Hence aborting RS. java.lang.ClassCastException: $Proxy9 cannot be cast to org.apache.hadoop.hbase.thrift.generated.Hbase$Iface at org.apache.hadoop.hbase.thrift.HbaseHandlerMetricsProxy.newInstance(HbaseHandlerMetricsProxy.java:47) at org.apache.hadoop.hbase.thrift.ThriftServerRunner.init(ThriftServerRunner.java:239) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer.init(HRegionThriftServer.java:74) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeThreads(HRegionServer.java:646) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:546) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:658) at java.lang.Thread.run(Thread.java:662) 2012-02-21 15:05:18,749 FATAL org.apache.hadoop.h -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5422) StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins)
[ https://issues.apache.org/jira/browse/HBASE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213719#comment-13213719 ] Hadoop QA commented on HBASE-5422: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515596/5422-v3.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 151 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1010//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1010//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1010//console This message is automatically generated. StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins) -- Key: HBASE-5422 URL: https://issues.apache.org/jira/browse/HBASE-5422 Project: HBase Issue Type: Bug Components: master Reporter: chunhui shen Attachments: 5422-90.patch, 5422-90v2.patch, 5422-90v3.patch, 5422-v3.txt, hbase-5422.patch, hbase-5422v2.patch, hbase-5422v3.patch In our produce environment We find a lot of timeout on RIT when cluster up, there are about 7w regions in the cluster( 25 regionservers ). First, we could see the following log:(See the region 33cf229845b1009aa8a3f7b0f85c9bd0) master's log 2012-02-13 18:07:41,409 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Async create of unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 with OFFLINE state 2012-02-13 18:07:42,560 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409, server=r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:07:42,996 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329127662996 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 2012-02-13 18:11:16,744 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=r03f11025.yh.aliyun.com,60020,1329127549907, region=33cf229845b1009aa8a3f7b0f85c9bd0 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 33cf229845b1009aa8a3f7b0f85c9bd0; deleting unassigned node 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Deleting existing unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 that is in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Successfully deleted unassigned node for region 33cf229845b1009aa8a3f7b0f85c9bd0 in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,573 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. on r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for
[jira] [Commented] (HBASE-5441) HRegionThriftServer may not start because of a race-condition
[ https://issues.apache.org/jira/browse/HBASE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213730#comment-13213730 ] Phabricator commented on HBASE-5441: tedyu has commented on the revision HBASE-5441 [jira] HRegionThriftServer may not start because of a race-condition. Hadoop QA reported the following in test failures: java.lang.NullPointerException at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.getHBaseAdmin(ThriftServerRunner.java:502) at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.getTableNames(ThriftServerRunner.java:563) at org.apache.hadoop.hbase.thrift.TestThriftServer.createTestTables(TestThriftServer.java:169) at org.apache.hadoop.hbase.thrift.TestThriftServer.doTestTableCreateDrop(TestThriftServer.java:119) REVISION DETAIL https://reviews.facebook.net/D1857 BRANCH hb-5441 HRegionThriftServer may not start because of a race-condition - Key: HBASE-5441 URL: https://issues.apache.org/jira/browse/HBASE-5441 Project: HBase Issue Type: Bug Components: thrift Reporter: Scott Chen Assignee: Scott Chen Priority: Minor Attachments: HBASE-5441.D1845.1.patch, HBASE-5441.D1845.2.patch, HBASE-5441.D1857.2.patch, HBASE-5441.D1857.3.patch This happens because the master is not started when ThriftServerRunner tries to create an HBaseAdmin. org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1333) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:899) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy8.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:649) at org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:108) at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.init(ThriftServerRunner.java:516) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer$HBaseHandlerRegion.init(HRegionThriftServer.java:104) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer.init(HRegionThriftServer.java:74) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeThreads(HRegionServer.java:646) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:546) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:658) at java.lang.Thread.run(Thread.java:662) 2012-02-21 16:38:18,223 INFO org.apache.hadoop.hba -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5422) StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins)
[ https://issues.apache.org/jira/browse/HBASE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213735#comment-13213735 ] Zhihong Yu commented on HBASE-5422: --- @Stack: Do you want to take a look at patch v3 ? StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins) -- Key: HBASE-5422 URL: https://issues.apache.org/jira/browse/HBASE-5422 Project: HBase Issue Type: Bug Components: master Reporter: chunhui shen Attachments: 5422-90.patch, 5422-90v2.patch, 5422-90v3.patch, 5422-v3.txt, hbase-5422.patch, hbase-5422v2.patch, hbase-5422v3.patch In our produce environment We find a lot of timeout on RIT when cluster up, there are about 7w regions in the cluster( 25 regionservers ). First, we could see the following log:(See the region 33cf229845b1009aa8a3f7b0f85c9bd0) master's log 2012-02-13 18:07:41,409 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Async create of unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 with OFFLINE state 2012-02-13 18:07:42,560 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409, server=r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:07:42,996 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329127662996 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 2012-02-13 18:11:16,744 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=r03f11025.yh.aliyun.com,60020,1329127549907, region=33cf229845b1009aa8a3f7b0f85c9bd0 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 33cf229845b1009aa8a3f7b0f85c9bd0; deleting unassigned node 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Deleting existing unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 that is in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Successfully deleted unassigned node for region 33cf229845b1009aa8a3f7b0f85c9bd0 in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,573 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. on r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. so generated a random one; hri=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0., src=, dest=r01b05043.yh.aliyun.com,60020,1329127549041; 29 (online=29, exclude=null) available servers 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. to r01b05043.yh.aliyun.com,60020,1329127549041 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329132528086 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. Regionserver's log 2012-02-13 18:07:43,537 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 2012-02-13 18:11:16,560 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open of item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. Through the RS's log, we could find it is larger than 3mins from receive openRegion request to start
[jira] [Updated] (HBASE-4403) Adopt interface stability/audience classifications from Hadoop
[ https://issues.apache.org/jira/browse/HBASE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-4403: --- Attachment: hbase-4403-interface_v3.txt hbase-4403.patch Adopt interface stability/audience classifications from Hadoop -- Key: HBASE-4403 URL: https://issues.apache.org/jira/browse/HBASE-4403 Project: HBase Issue Type: Task Affects Versions: 0.90.5, 0.92.0 Reporter: Todd Lipcon Assignee: Jimmy Xiang Fix For: 0.94.0 Attachments: hbase-4403-interface.txt, hbase-4403-interface_v2.txt, hbase-4403-interface_v3.txt, hbase-4403-nowhere-near-done.txt, hbase-4403.patch As HBase gets more widely used, we need to be more explicit about which APIs are stable and not expected to break between versions, which APIs are still evolving, etc. We also have many public classes that are really internal to the RS or Master and not meant to be used by users. Hadoop has adopted a classification scheme for audience (public, private, or limited-private) as well as stability (stable, evolving, unstable). I think we should copy these annotations to HBase and start to classify our public classes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4403) Adopt interface stability/audience classifications from Hadoop
[ https://issues.apache.org/jira/browse/HBASE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-4403: --- Fix Version/s: 0.94.0 Status: Patch Available (was: Open) Adopt interface stability/audience classifications from Hadoop -- Key: HBASE-4403 URL: https://issues.apache.org/jira/browse/HBASE-4403 Project: HBase Issue Type: Task Affects Versions: 0.92.0, 0.90.5 Reporter: Todd Lipcon Assignee: Jimmy Xiang Fix For: 0.94.0 Attachments: hbase-4403-interface.txt, hbase-4403-interface_v2.txt, hbase-4403-interface_v3.txt, hbase-4403-nowhere-near-done.txt, hbase-4403.patch As HBase gets more widely used, we need to be more explicit about which APIs are stable and not expected to break between versions, which APIs are still evolving, etc. We also have many public classes that are really internal to the RS or Master and not meant to be used by users. Hadoop has adopted a classification scheme for audience (public, private, or limited-private) as well as stability (stable, evolving, unstable). I think we should copy these annotations to HBase and start to classify our public classes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5422) StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins)
[ https://issues.apache.org/jira/browse/HBASE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5422: - Resolution: Fixed Fix Version/s: 0.92.1 Assignee: chunhui shen Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to 0.92 branch and to trunk. Thanks for the patch Chunhui. StartupBulkAssigner would cause a lot of timeout on RIT when assigning large numbers of regions (timeout = 3 mins) -- Key: HBASE-5422 URL: https://issues.apache.org/jira/browse/HBASE-5422 Project: HBase Issue Type: Bug Components: master Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.92.1 Attachments: 5422-90.patch, 5422-90v2.patch, 5422-90v3.patch, 5422-v3.txt, hbase-5422.patch, hbase-5422v2.patch, hbase-5422v3.patch In our produce environment We find a lot of timeout on RIT when cluster up, there are about 7w regions in the cluster( 25 regionservers ). First, we could see the following log:(See the region 33cf229845b1009aa8a3f7b0f85c9bd0) master's log 2012-02-13 18:07:41,409 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Async create of unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 with OFFLINE state 2012-02-13 18:07:42,560 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409, server=r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:07:42,996 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=OFFLINE, ts=1329127661409 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329127662996 2012-02-13 18:10:48,072 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 2012-02-13 18:11:16,744 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=r03f11025.yh.aliyun.com,60020,1329127549907, region=33cf229845b1009aa8a3f7b0f85c9bd0 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 33cf229845b1009aa8a3f7b0f85c9bd0; deleting unassigned node 2012-02-13 18:38:07,310 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Deleting existing unassigned node for 33cf229845b1009aa8a3f7b0f85c9bd0 that is in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x348f4a94723da5 Successfully deleted unassigned node for region 33cf229845b1009aa8a3f7b0f85c9bd0 in expected state RS_ZK_REGION_OPENED 2012-02-13 18:38:07,573 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. on r03f11025.yh.aliyun.com,60020,1329127549907 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. so generated a random one; hri=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0., src=, dest=r01b05043.yh.aliyun.com,60020,1329127549041; 29 (online=29, exclude=null) available servers 2012-02-13 18:50:54,428 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. to r01b05043.yh.aliyun.com,60020,1329127549041 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. state=PENDING_OPEN, ts=1329132528086 2012-02-13 19:31:50,514 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. Regionserver's log 2012-02-13 18:07:43,537 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: item_20120208,\x009,1328794343859.33cf229845b1009aa8a3f7b0f85c9bd0. 2012-02-13 18:11:16,560 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing
[jira] [Commented] (HBASE-4403) Adopt interface stability/audience classifications from Hadoop
[ https://issues.apache.org/jira/browse/HBASE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213789#comment-13213789 ] Hadoop QA commented on HBASE-4403: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515619/hbase-4403-interface_v3.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1011//console This message is automatically generated. Adopt interface stability/audience classifications from Hadoop -- Key: HBASE-4403 URL: https://issues.apache.org/jira/browse/HBASE-4403 Project: HBase Issue Type: Task Affects Versions: 0.90.5, 0.92.0 Reporter: Todd Lipcon Assignee: Jimmy Xiang Fix For: 0.94.0 Attachments: hbase-4403-interface.txt, hbase-4403-interface_v2.txt, hbase-4403-interface_v3.txt, hbase-4403-nowhere-near-done.txt, hbase-4403.patch As HBase gets more widely used, we need to be more explicit about which APIs are stable and not expected to break between versions, which APIs are still evolving, etc. We also have many public classes that are really internal to the RS or Master and not meant to be used by users. Hadoop has adopted a classification scheme for audience (public, private, or limited-private) as well as stability (stable, evolving, unstable). I think we should copy these annotations to HBase and start to classify our public classes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5166) MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop
[ https://issues.apache.org/jira/browse/HBASE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213793#comment-13213793 ] jirapos...@reviews.apache.org commented on HBASE-5166: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3995/#review5268 --- Quite a few white spaces need to be removed. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java https://reviews.apache.org/r/3995/#comment11536 Should read 'MultithreadedTableMapper instances' /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java https://reviews.apache.org/r/3995/#comment11508 Leave a space between while and ( Another space between ) and { /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java https://reviews.apache.org/r/3995/#comment11537 Can we give better progress information here ? /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java https://reviews.apache.org/r/3995/#comment11535 Long line, please wrap to 80 chars. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java https://reviews.apache.org/r/3995/#comment11534 This if block can be an else to the if block above. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java https://reviews.apache.org/r/3995/#comment11533 Please remove white space. - Ted On 2012-02-22 07:20:13, Jai Singh wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3995/ bq. --- bq. bq. (Updated 2012-02-22 07:20:13) bq. bq. bq. Review request for hbase, Ted Yu and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. bq. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. bq. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). bq. bq. Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. bq. bq. bq. Diffs bq. - bq. bq. /src/main/java/org/apache/hadoop/hbase/mapreduce/MultithreadedTableMapper.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/mapreduce/TestMulitthreadedTableMapper.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/3995/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Jai bq. bq. MultiThreaded Table Mapper analogous to MultiThreaded Mapper in hadoop -- Key: HBASE-5166 URL: https://issues.apache.org/jira/browse/HBASE-5166 Project: HBase Issue Type: Improvement Reporter: Jai Kumar Singh Priority: Minor Labels: multithreaded, tablemapper Attachments: 0001-Added-MultithreadedTableMapper-HBASE-5166.patch, 0003-Added-MultithreadedTableMapper-HBASE-5166.patch, 0005-HBASE-5166-Added-MultithreadedTableMapper.patch, 0006-HBASE-5166-Added-MultithreadedTableMapper.patch Original Estimate: 0.5h Remaining Estimate: 0.5h There is no MultiThreadedTableMapper in hbase currently just like we have a MultiThreadedMapper in Hadoop for IO Bound Jobs. UseCase, webcrawler: take input (urls) from a hbase table and put the content (urls, content) back into hbase. Running these kind of hbase mapreduce job with normal table mapper is quite slow as we are not utilizing CPU fully (N/W IO Bound). Moreover, I want to know whether It would be a good/bad idea to use HBase for these kind of usecases ?. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5251) Some commands return 0 rows when 0 rows were processed successfully
[ https://issues.apache.org/jira/browse/HBASE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha reassigned HBASE-5251: -- Assignee: Himanshu Vashishtha Some commands return 0 rows when 0 rows were processed successfully --- Key: HBASE-5251 URL: https://issues.apache.org/jira/browse/HBASE-5251 Project: HBase Issue Type: Bug Components: shell Affects Versions: 0.90.5 Reporter: David S. Wang Assignee: Himanshu Vashishtha Priority: Minor Labels: noob From the hbase shell, I see this: hbase(main):049:0 scan 't1' ROW COLUMN+CELL r1 column=f1:c1, timestamp=1327104295560, value=value r1 column=f1:c2, timestamp=1327104330625, value=value 1 row(s) in 0.0300 seconds hbase(main):050:0 deleteall 't1', 'r1' 0 row(s) in 0.0080 seconds == I expected this to read 2 row(s) hbase(main):051:0 scan 't1' ROW COLUMN+CELL 0 row(s) in 0.0090 seconds I expected the deleteall command to return 1 row(s) instead of 0, because 1 row was deleted. Similar behavior for delete and some other commands. Some commands such as put work fine. Looking at the ruby shell code, it seems that formatter.footer() is called even for commands that will not actually increment the number of rows reported, such as deletes. Perhaps there should be another similar function to formatter.footer(), but that will not print out @row_count. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4348) Add metrics for regions in transition
[ https://issues.apache.org/jira/browse/HBASE-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha reassigned HBASE-4348: -- Assignee: Himanshu Vashishtha Add metrics for regions in transition - Key: HBASE-4348 URL: https://issues.apache.org/jira/browse/HBASE-4348 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Himanshu Vashishtha Priority: Minor Labels: noob The following metrics would be useful for monitoring the master: - the number of regions in transition - the number of regions in transition that have been in transition for more than a minute - how many seconds has the oldest region-in-transition been in transition -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5433) [REST] Add metrics to keep track of success/failure count
[ https://issues.apache.org/jira/browse/HBASE-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213818#comment-13213818 ] Andrew Purtell commented on HBASE-5433: --- +1 on the patch. Hudson bot failures seem unrelated. [REST] Add metrics to keep track of success/failure count - Key: HBASE-5433 URL: https://issues.apache.org/jira/browse/HBASE-5433 Project: HBase Issue Type: Improvement Components: metrics, rest Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Mubarak Seyed Labels: noob Fix For: 0.94.0 Attachments: HBASE-5433.trunk.v1.patch In a production environment, the visibility of successful REST request(s) are not getting exposed to metric system as we have only one metric (requests) today. Proposing to add more metrics such as successful_get_count, failed_get_count, successful_put_count, failed_put_count The current implementation increases the request count at the beginning of the method implementation and it is very hard to monitor requests (unless turn on debug, find the row_key and validate it in get/scan using hbase shell), it will be very useful to ops to keep an eye as requests from cross data-centers are trying to write data to one cluster using REST gateway through load balancer (and there is no visibility of which REST-server/RS failed to write data) {code} Response update(final CellSetModel model, final boolean replace) { // for requests servlet.getMetrics().incrementRequests(1); .. .. table.put(puts); table.flushCommits(); ResponseBuilder response = Response.ok(); // for successful_get_count servlet.getMetrics().incrementSuccessfulGetRequests(1); return response.build(); } catch (IOException e) { // for failed_get_count servlet.getMetrics().incrementFailedGetRequests(1); throw new WebApplicationException(e, Response.Status.SERVICE_UNAVAILABLE); } finally { } } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5357) Use builder pattern in HColumnDescriptor
[ https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213820#comment-13213820 ] Phabricator commented on HBASE-5357: mbautin has commented on the revision [jira] [HBASE-5357] Refactoring: use the builder pattern for HColumnDescriptor. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java:1119 Not sure, but it is better to deal with changing the default number of versions in ROOT or META in a separate patch. This one is just a refactoring. src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java:1131 See my comment above. src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java:489 Sure, I'll update setter javadocs. REVISION DETAIL https://reviews.facebook.net/D1851 Use builder pattern in HColumnDescriptor Key: HBASE-5357 URL: https://issues.apache.org/jira/browse/HBASE-5357 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D1851.1.patch, Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. This particular JIRA addresses the HColumnDescriptor refactoring. For StoreFile/HFile refactoring see HBASE-5442. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5387) Reuse compression streams in HFileBlock.Writer
[ https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213825#comment-13213825 ] Phabricator commented on HBASE-5387: mbautin has committed the revision [jira] [HBASE-5387] [89-fb] Reuse compression streams in HFileBlock.Writer. REVISION DETAIL https://reviews.facebook.net/D1725 COMMIT https://reviews.facebook.net/rHBASEEIGHTNINEFBBRANCH1292434 Reuse compression streams in HFileBlock.Writer -- Key: HBASE-5387 URL: https://issues.apache.org/jira/browse/HBASE-5387 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Critical Fix For: 0.94.0 Attachments: 5387.txt, D1719.1.patch, D1719.2.patch, D1719.3.patch, D1719.4.patch, D1719.5.patch, D1725.1.patch, D1725.2.patch, Fix-deflater-leak-2012-02-10_18_48_45.patch, Fix-deflater-leak-2012-02-11_17_13_10.patch, Fix-deflater-leak-2012-02-12_00_37_27.patch We need to to reuse compression streams in HFileBlock.Writer instead of allocating them every time. The motivation is that when using Java's built-in implementation of Gzip, we allocate a new GZIPOutputStream object and an associated native data structure every time we create a compression stream. The native data structure is only deallocated in the finalizer. This is one suspected cause of recent TestHFileBlock failures on Hadoop QA: https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5445) Add PB-based calls to HMasterInterface
[ https://issues.apache.org/jira/browse/HBASE-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HBASE-5445: --- Attachment: HMasterProtocol.proto hbase.proto Attaching the (incomplete) proto definitions for HMasterInterface that I did sometime back. Greg, you might find them useful. Add PB-based calls to HMasterInterface -- Key: HBASE-5445 URL: https://issues.apache.org/jira/browse/HBASE-5445 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Todd Lipcon Assignee: Gregory Chanan Attachments: HMasterProtocol.proto, hbase.proto -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5441) HRegionThriftServer may not start because of a race-condition
[ https://issues.apache.org/jira/browse/HBASE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213830#comment-13213830 ] Phabricator commented on HBASE-5441: sc has commented on the revision HBASE-5441 [jira] HRegionThriftServer may not start because of a race-condition. Sorry. I synchronized on the wrong reference and I should have run the test before update the patch. I will update it soon. REVISION DETAIL https://reviews.facebook.net/D1857 BRANCH hb-5441 HRegionThriftServer may not start because of a race-condition - Key: HBASE-5441 URL: https://issues.apache.org/jira/browse/HBASE-5441 Project: HBase Issue Type: Bug Components: thrift Reporter: Scott Chen Assignee: Scott Chen Priority: Minor Attachments: HBASE-5441.D1845.1.patch, HBASE-5441.D1845.2.patch, HBASE-5441.D1857.2.patch, HBASE-5441.D1857.3.patch This happens because the master is not started when ThriftServerRunner tries to create an HBaseAdmin. org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1333) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:899) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy8.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:649) at org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:108) at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.init(ThriftServerRunner.java:516) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer$HBaseHandlerRegion.init(HRegionThriftServer.java:104) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer.init(HRegionThriftServer.java:74) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeThreads(HRegionServer.java:646) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:546) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:658) at java.lang.Thread.run(Thread.java:662) 2012-02-21 16:38:18,223 INFO org.apache.hadoop.hba -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5357) Use builder pattern in HColumnDescriptor
[ https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5357: --- Attachment: D1851.2.patch mbautin updated the revision [jira] [HBASE-5357] Refactoring: use the builder pattern for HColumnDescriptor. Reviewers: JIRA, todd, stack, tedyu, Kannan, Karthik, Liyin Addressing Stack's comment. REVISION DETAIL https://reviews.facebook.net/D1851 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java src/main/java/org/apache/hadoop/hbase/client/UnmodifyableHColumnDescriptor.java src/main/java/org/apache/hadoop/hbase/thrift/ThriftUtilities.java src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/TestSerialization.java src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java src/test/java/org/apache/hadoop/hbase/io/encoding/TestEncodedSeekers.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestForceCacheImportantBlocks.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestScannerSelectionUsingTTL.java src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportExport.java src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java src/test/java/org/apache/hadoop/hbase/regionserver/TestColumnSeeking.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java src/test/java/org/apache/hadoop/hbase/regionserver/TestScanner.java src/test/java/org/apache/hadoop/hbase/regionserver/TestSeekOptimizations.java src/test/java/org/apache/hadoop/hbase/regionserver/TestWideScanner.java src/test/java/org/apache/hadoop/hbase/thrift2/TestThriftHBaseServiceHandler.java Use builder pattern in HColumnDescriptor Key: HBASE-5357 URL: https://issues.apache.org/jira/browse/HBASE-5357 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D1851.1.patch, D1851.2.patch, Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. This particular JIRA addresses the HColumnDescriptor refactoring. For StoreFile/HFile refactoring see HBASE-5442. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5357) Use builder pattern in HColumnDescriptor
[ https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213855#comment-13213855 ] Hadoop QA commented on HBASE-5357: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515626/D1851.2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 60 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 151 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestWideScanner Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1012//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1012//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1012//console This message is automatically generated. Use builder pattern in HColumnDescriptor Key: HBASE-5357 URL: https://issues.apache.org/jira/browse/HBASE-5357 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D1851.1.patch, D1851.2.patch, Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. This particular JIRA addresses the HColumnDescriptor refactoring. For StoreFile/HFile refactoring see HBASE-5442. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3149) Make flush decisions per column family
[ https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213858#comment-13213858 ] Nicolas Spiegelberg commented on HBASE-3149: @Lars/Stack: note that the number of StoreFiles necessary to store N amount of data is order O(log N) with the existing compaction algorithm. This means that setting the compaction min size to a low value will not result in significantly more files. Furthermore, what's hurting performance is not the amount of files but the size of each file. The extra files will be very small and take up only a minority of the space in the LRU cache. Every time you unnecessarily compact files, you have to repopulate that StoreFile in the LRU cache and get a lot of disk reads in addition to the obvious write increase. This is all to say that I would recommend defaulting it to that low because the downsides are very minimal and the benefit can be substantial IO gains. bq. At the same time, I'd think this issue still worth some time; if lots of cfs and only one is filling, its silly to flush the others as we do now because one is over the threshold. Why is this silly? With cache-on-write, the data is still cached in memory. It's just migrated from the MemCache to the BlockCache, which has comparable performance. Furthermore, BlockCache data is compressed, so it then takes up less space. Flushing also minimizes the amount of HLogs and decreases recovery time. Flushing would be bad if it meant we weren't optimally using the global MemStore size, but we currently are. bq. This surely seems a specific setting for this use-case, and there are others that need a slightly different setting. If you mix those two on the same cluster, then having only one global setting to adjust this seems restrictive? Should this be a setting per table, like the flush size? I think this is a better default, not that it's a one-size setting. I agree that this should toggleable on a per-CF basis, hence HBASE-5335. Make flush decisions per column family -- Key: HBASE-3149 URL: https://issues.apache.org/jira/browse/HBASE-3149 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Karthik Ranganathan Assignee: Nicolas Spiegelberg Priority: Critical Fix For: 0.92.1 Today, the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5433) [REST] Add metrics to keep track of success/failure count
[ https://issues.apache.org/jira/browse/HBASE-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213869#comment-13213869 ] Mubarak Seyed commented on HBASE-5433: -- bq. can you see the metrics coming out in your metrics system? They show ok? Tested in live cluster, could see metrics in Ganglia. Thanks. [REST] Add metrics to keep track of success/failure count - Key: HBASE-5433 URL: https://issues.apache.org/jira/browse/HBASE-5433 Project: HBase Issue Type: Improvement Components: metrics, rest Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Mubarak Seyed Labels: noob Fix For: 0.94.0 Attachments: HBASE-5433.trunk.v1.patch In a production environment, the visibility of successful REST request(s) are not getting exposed to metric system as we have only one metric (requests) today. Proposing to add more metrics such as successful_get_count, failed_get_count, successful_put_count, failed_put_count The current implementation increases the request count at the beginning of the method implementation and it is very hard to monitor requests (unless turn on debug, find the row_key and validate it in get/scan using hbase shell), it will be very useful to ops to keep an eye as requests from cross data-centers are trying to write data to one cluster using REST gateway through load balancer (and there is no visibility of which REST-server/RS failed to write data) {code} Response update(final CellSetModel model, final boolean replace) { // for requests servlet.getMetrics().incrementRequests(1); .. .. table.put(puts); table.flushCommits(); ResponseBuilder response = Response.ok(); // for successful_get_count servlet.getMetrics().incrementSuccessfulGetRequests(1); return response.build(); } catch (IOException e) { // for failed_get_count servlet.getMetrics().incrementFailedGetRequests(1); throw new WebApplicationException(e, Response.Status.SERVICE_UNAVAILABLE); } finally { } } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5434) [REST] Include more metrics in cluster status request
[ https://issues.apache.org/jira/browse/HBASE-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213878#comment-13213878 ] Mubarak Seyed commented on HBASE-5434: -- Could not get _compactionProgressPct_ value as it depends on [HBASE-3943|https://issues.apache.org/jira/browse/HBASE-3943], so _compactionProgressPct_ will not be included in output. Thanks. [REST] Include more metrics in cluster status request - Key: HBASE-5434 URL: https://issues.apache.org/jira/browse/HBASE-5434 Project: HBase Issue Type: Improvement Components: metrics, rest Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Mubarak Seyed Priority: Minor Labels: noob Fix For: 0.94.0 Attachments: HBASE-5434.trunk.v1.patch /status/cluster shows only {code} stores=2 storefiless=0 storefileSizeMB=0 memstoreSizeMB=0 storefileIndexSizeMB=0 {code} for a region but master web-ui shows {code} stores=1, storefiles=0, storefileUncompressedSizeMB=0 storefileSizeMB=0 memstoreSizeMB=0 storefileIndexSizeMB=0 readRequestsCount=0 writeRequestsCount=0 rootIndexSizeKB=0 totalStaticIndexSizeKB=0 totalStaticBloomSizeKB=0 totalCompactingKVs=0 currentCompactedKVs=0 compactionProgressPct=NaN {code} In a write-heavy REST gateway based production environment, ops team needs to verify whether write counters are getting incremented per region (they do run /status/cluster on each REST server), we can get the same values from *rpc.metrics.put_num_ops* and *hbase.regionserver.writeRequestsCount* but some home-grown tools needs to parse the output of /status/cluster and updates the dashboard. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213883#comment-13213883 ] Phabricator commented on HBASE-5074: stack has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. Answering Dhruba. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 Seems like we could have better names for these methods, ones that give more of a clue as to what they are about. getBackingFS, getNoChecksumFS? Maybe you are keepign them generic like this because you will be back in this area again soon doing another beautiful speedup on top of this checksumming fix (When we going to do read-ahead? Would that speed scanning?) src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:44 ok. np. src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:49 Ok. So, two readers. Our file count is going to go up? We should release note this as side effect of enabling this feature (previous you may have been well below xceivers limit but now you could go over the top?) I didn't notice this was going on. Need to foreground it I'd say. src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:84 I figured. Its fine as is. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213884#comment-13213884 ] Phabricator commented on HBASE-5074: stack has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. Answering Dhruba. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 Seems like we could have better names for these methods, ones that give more of a clue as to what they are about. getBackingFS, getNoChecksumFS? Maybe you are keepign them generic like this because you will be back in this area again soon doing another beautiful speedup on top of this checksumming fix (When we going to do read-ahead? Would that speed scanning?) src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:44 ok. np. src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java:49 Ok. So, two readers. Our file count is going to go up? We should release note this as side effect of enabling this feature (previous you may have been well below xceivers limit but now you could go over the top?) I didn't notice this was going on. Need to foreground it I'd say. src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:84 I figured. Its fine as is. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5434) [REST] Include more metrics in cluster status request
[ https://issues.apache.org/jira/browse/HBASE-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213897#comment-13213897 ] Mubarak Seyed commented on HBASE-5434: -- Need to update schema definition in wiki page http://wiki.apache.org/hadoop/Hbase/Stargate *protobufs schema* {code} message StorageClusterStatus { message Region { required bytes name = 1; optional int32 stores = 2; optional int32 storefiles = 3; optional int32 storefileSizeMB = 4; optional int32 memstoreSizeMB = 5; optional int32 storefileIndexSizeMB = 6; optional int64 readRequestsCount = 7; optional int64 writeRequestsCount = 8; optional int32 rootIndexSizeKB = 9; optional int32 totalStaticIndexSizeKB = 10; optional int32 totalStaticBloomSizeKB = 11; optional int64 totalCompactingKVs = 12; optional int64 currentCompactedKVs = 13; {color} } .. } {code} *XML schema* {code} complexType name=Region attribute name=name type=base64Binary/attribute attribute name=stores type=int/attribute attribute name=storefiles type=int/attribute attribute name=storefileSizeMB type=int/attribute attribute name=memstoreSizeMB type=int/attribute attribute name=storefileIndexSizeMB type=int/attribute attribute name=readRequestsCount type=int/attribute attribute name=writeRequestsCount type=int/attribute attribute name=rootIndexSizeKB type=int/attribute attribute name=totalStaticIndexSizeKB type=int/attribute attribute name=totalStaticBloomSizeKB type=int/attribute attribute name=totalCompactingKVs type=int/attribute attribute name=currentCompactedKVs type=int/attribute /complexType {code} [REST] Include more metrics in cluster status request - Key: HBASE-5434 URL: https://issues.apache.org/jira/browse/HBASE-5434 Project: HBase Issue Type: Improvement Components: metrics, rest Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Mubarak Seyed Priority: Minor Labels: noob Fix For: 0.94.0 Attachments: HBASE-5434.trunk.v1.patch /status/cluster shows only {code} stores=2 storefiless=0 storefileSizeMB=0 memstoreSizeMB=0 storefileIndexSizeMB=0 {code} for a region but master web-ui shows {code} stores=1, storefiles=0, storefileUncompressedSizeMB=0 storefileSizeMB=0 memstoreSizeMB=0 storefileIndexSizeMB=0 readRequestsCount=0 writeRequestsCount=0 rootIndexSizeKB=0 totalStaticIndexSizeKB=0 totalStaticBloomSizeKB=0 totalCompactingKVs=0 currentCompactedKVs=0 compactionProgressPct=NaN {code} In a write-heavy REST gateway based production environment, ops team needs to verify whether write counters are getting incremented per region (they do run /status/cluster on each REST server), we can get the same values from *rpc.metrics.put_num_ops* and *hbase.regionserver.writeRequestsCount* but some home-grown tools needs to parse the output of /status/cluster and updates the dashboard. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5407) Show the per-region level request/sec count in the web ui
[ https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213901#comment-13213901 ] Phabricator commented on HBASE-5407: Liyin has commented on the revision [jira][HBASE-5407][89-fb] Show the per-region level request/sec count in the web ui. Thanks Prakash's comments. Answer them inline. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/metrics/RequestMetrics.java:53 In the run function of the HRegion server will periodically (hbase.regionserver.msginterval) call createRegionLoad, which will refresh the request/sec metrics. It follows the same way how other metrics get updated. Also when you refresh the web page, this function will be called as well. src/main/resources/hbase-webapps/regionserver/regionserver.jsp:22 sure. I will remove them. REVISION DETAIL https://reviews.facebook.net/D1779 Show the per-region level request/sec count in the web ui - Key: HBASE-5407 URL: https://issues.apache.org/jira/browse/HBASE-5407 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1779.1.patch, D1779.1.patch, D1779.1.patch It would be nice to show the per-region level request/sec count in the web ui, especially when debugging the hot region problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5407) Show the per-region level request/sec count in the web ui
[ https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213903#comment-13213903 ] Phabricator commented on HBASE-5407: Liyin has commented on the revision [jira][HBASE-5407][89-fb] Show the per-region level request/sec count in the web ui. Thanks Prakash's comments. Answer them inline. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/metrics/RequestMetrics.java:53 In the run function of the HRegion server will periodically (hbase.regionserver.msginterval) call createRegionLoad, which will refresh the request/sec metrics. It follows the same way how other metrics get updated. Also when you refresh the web page, this function will be called as well. src/main/resources/hbase-webapps/regionserver/regionserver.jsp:22 sure. I will remove them. REVISION DETAIL https://reviews.facebook.net/D1779 Show the per-region level request/sec count in the web ui - Key: HBASE-5407 URL: https://issues.apache.org/jira/browse/HBASE-5407 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1779.1.patch, D1779.1.patch, D1779.1.patch It would be nice to show the per-region level request/sec count in the web ui, especially when debugging the hot region problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5407) Show the per-region level request/sec count in the web ui
[ https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213904#comment-13213904 ] Phabricator commented on HBASE-5407: Liyin has commented on the revision [jira][HBASE-5407][89-fb] Show the per-region level request/sec count in the web ui. Thanks Prakash's comments. Answer them inline. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/metrics/RequestMetrics.java:53 In the run function of the HRegion server will periodically (hbase.regionserver.msginterval) call createRegionLoad, which will refresh the request/sec metrics. It follows the same way how other metrics get updated. Also when you refresh the web page, this function will be called as well. src/main/resources/hbase-webapps/regionserver/regionserver.jsp:22 sure. I will remove them. REVISION DETAIL https://reviews.facebook.net/D1779 Show the per-region level request/sec count in the web ui - Key: HBASE-5407 URL: https://issues.apache.org/jira/browse/HBASE-5407 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1779.1.patch, D1779.1.patch, D1779.1.patch It would be nice to show the per-region level request/sec count in the web ui, especially when debugging the hot region problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5407) Show the per-region level request/sec count in the web ui
[ https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213902#comment-13213902 ] Phabricator commented on HBASE-5407: Liyin has commented on the revision [jira][HBASE-5407][89-fb] Show the per-region level request/sec count in the web ui. Thanks Prakash's comments. Answer them inline. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/metrics/RequestMetrics.java:53 In the run function of the HRegion server will periodically (hbase.regionserver.msginterval) call createRegionLoad, which will refresh the request/sec metrics. It follows the same way how other metrics get updated. Also when you refresh the web page, this function will be called as well. src/main/resources/hbase-webapps/regionserver/regionserver.jsp:22 sure. I will remove them. REVISION DETAIL https://reviews.facebook.net/D1779 Show the per-region level request/sec count in the web ui - Key: HBASE-5407 URL: https://issues.apache.org/jira/browse/HBASE-5407 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1779.1.patch, D1779.1.patch, D1779.1.patch It would be nice to show the per-region level request/sec count in the web ui, especially when debugging the hot region problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5434) [REST] Include more metrics in cluster status request
[ https://issues.apache.org/jira/browse/HBASE-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213905#comment-13213905 ] Mubarak Seyed commented on HBASE-5434: -- @Andrew Can you please review the patch? Thanks. [REST] Include more metrics in cluster status request - Key: HBASE-5434 URL: https://issues.apache.org/jira/browse/HBASE-5434 Project: HBase Issue Type: Improvement Components: metrics, rest Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Mubarak Seyed Priority: Minor Labels: noob Fix For: 0.94.0 Attachments: HBASE-5434.trunk.v1.patch /status/cluster shows only {code} stores=2 storefiless=0 storefileSizeMB=0 memstoreSizeMB=0 storefileIndexSizeMB=0 {code} for a region but master web-ui shows {code} stores=1, storefiles=0, storefileUncompressedSizeMB=0 storefileSizeMB=0 memstoreSizeMB=0 storefileIndexSizeMB=0 readRequestsCount=0 writeRequestsCount=0 rootIndexSizeKB=0 totalStaticIndexSizeKB=0 totalStaticBloomSizeKB=0 totalCompactingKVs=0 currentCompactedKVs=0 compactionProgressPct=NaN {code} In a write-heavy REST gateway based production environment, ops team needs to verify whether write counters are getting incremented per region (they do run /status/cluster on each REST server), we can get the same values from *rpc.metrics.put_num_ops* and *hbase.regionserver.writeRequestsCount* but some home-grown tools needs to parse the output of /status/cluster and updates the dashboard. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5441) HRegionThriftServer may not start because of a race-condition
[ https://issues.apache.org/jira/browse/HBASE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5441: --- Attachment: HBASE-5441.D1857.4.patch sc updated the revision HBASE-5441 [jira] HRegionThriftServer may not start because of a race-condition. Reviewers: tedyu, dhruba, JIRA Synchronized on this instead of admin REVISION DETAIL https://reviews.facebook.net/D1857 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/thrift/ThriftServerRunner.java HRegionThriftServer may not start because of a race-condition - Key: HBASE-5441 URL: https://issues.apache.org/jira/browse/HBASE-5441 Project: HBase Issue Type: Bug Components: thrift Reporter: Scott Chen Assignee: Scott Chen Priority: Minor Attachments: HBASE-5441.D1845.1.patch, HBASE-5441.D1845.2.patch, HBASE-5441.D1857.2.patch, HBASE-5441.D1857.3.patch, HBASE-5441.D1857.4.patch This happens because the master is not started when ThriftServerRunner tries to create an HBaseAdmin. org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1333) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:899) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy8.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:649) at org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:108) at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.init(ThriftServerRunner.java:516) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer$HBaseHandlerRegion.init(HRegionThriftServer.java:104) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer.init(HRegionThriftServer.java:74) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeThreads(HRegionServer.java:646) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:546) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:658) at java.lang.Thread.run(Thread.java:662) 2012-02-21 16:38:18,223 INFO org.apache.hadoop.hba -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5455) Add test to avoid unintentional reordering of items in HbaseObjectWritable
Add test to avoid unintentional reordering of items in HbaseObjectWritable -- Key: HBASE-5455 URL: https://issues.apache.org/jira/browse/HBASE-5455 Project: HBase Issue Type: Test Reporter: Michael Drzal Priority: Minor HbaseObjectWritable has a static initialization block that assigns ints to various classes. The int is assigned by using a local variable that is incremented after each use. If someone adds a line in the middle of the block, this throws off everything after the change, and can break client compatibility. There is already a comment to not add/remove lines at the beginning of this block. It might make sense to have a test against a static set of ids. If something gets changed unintentionally, it would at least fail the tests. If the change was intentional, at the very least the test would need to get updated, and it would be a conscious decision. https://issues.apache.org/jira/browse/HBASE-5204 contains the the fix for one issue of this type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5455) Add test to avoid unintentional reordering of items in HbaseObjectWritable
[ https://issues.apache.org/jira/browse/HBASE-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5455: - Fix Version/s: 0.94.0 Add test to avoid unintentional reordering of items in HbaseObjectWritable -- Key: HBASE-5455 URL: https://issues.apache.org/jira/browse/HBASE-5455 Project: HBase Issue Type: Test Reporter: Michael Drzal Priority: Minor Fix For: 0.94.0 HbaseObjectWritable has a static initialization block that assigns ints to various classes. The int is assigned by using a local variable that is incremented after each use. If someone adds a line in the middle of the block, this throws off everything after the change, and can break client compatibility. There is already a comment to not add/remove lines at the beginning of this block. It might make sense to have a test against a static set of ids. If something gets changed unintentionally, it would at least fail the tests. If the change was intentional, at the very least the test would need to get updated, and it would be a conscious decision. https://issues.apache.org/jira/browse/HBASE-5204 contains the the fix for one issue of this type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4618) HBase backups
[ https://issues.apache.org/jira/browse/HBASE-4618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213941#comment-13213941 ] Lars Hofhansl commented on HBASE-4618: -- Are you planning to release the various tools you use as open source? At Salesforce we need to get started seriously on backup procedures and I would like to avoid a lot of duplicate work. HBase backups - Key: HBASE-4618 URL: https://issues.apache.org/jira/browse/HBASE-4618 Project: HBase Issue Type: Umbrella Components: documentation, regionserver Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan We have been working on the ability to do backups in HBase with different levels of protection. This is an umbrella task for all the backup related changes. Here are some kinds of changes - will create separate issues for them: Roughly here are a few flavors of backups giving increasing levels of guarentees: 1. Per cf backups 2. Multi-cf backups with row atomicity preserved 3. Multi-cf backups with row atomicity and point in time recovery. On the perf dimension, here is a list of improvements: 1. Copy the files - regular hadoop cp 2. Use fast copy - copy blocks and stitch them together, saves top of rack bandwidth 3. Use fast copy with hard links - no file copy, it does only ext3 level linking. On the durability of data side: 1. Ability to backup data onto the same racks as those running HBase 2. Intra-datacenter backup 3. Inter datacenter backup Restores: 1. Restore with a table name different from the backed up table name 2. Restore a backed up table wen HBase cluster is not running at restore time 3. Restore into a live and running cluster Operationally: 1. How to setup backups in live cluster 2. Setting up intra-DC 3. cross-DC backups 4. Verifying a backup is good -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5074: --- Attachment: D1521.8.patch dhruba updated the revision [jira] [HBASE-5074] Support checksums in HBase block cache. Reviewers: mbautin Changed names of HFileSystem methods/varibales to better reflect reality. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/fs src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5074: --- Attachment: D1521.8.patch dhruba updated the revision [jira] [HBASE-5074] Support checksums in HBase block cache. Reviewers: mbautin Changed names of HFileSystem methods/varibales to better reflect reality. REVISION DETAIL https://reviews.facebook.net/D1521 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java src/test/java/org/apache/hadoop/hbase/regionserver/HFileReadWriteTest.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/fs src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java src/main/java/org/apache/hadoop/hbase/util/ChecksumFactory.java src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213959#comment-13213959 ] Phabricator commented on HBASE-5074: dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. Ok. So, two readers. Our file count is going to go up? The file count should not go up. We still do the same number of ios to hdfs, so the number of concurrent IOs on a datanode should still be the same, so the number of xceivers on the datanode should not be adversely affected by this patch. Please let me know if I am missing something here. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213957#comment-13213957 ] Phabricator commented on HBASE-5074: dhruba has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. Ok. So, two readers. Our file count is going to go up? The file count should not go up. We still do the same number of ios to hdfs, so the number of concurrent IOs on a datanode should still be the same, so the number of xceivers on the datanode should not be adversely affected by this patch. Please let me know if I am missing something here. REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5441) HRegionThriftServer may not start because of a race-condition
[ https://issues.apache.org/jira/browse/HBASE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213958#comment-13213958 ] Phabricator commented on HBASE-5441: tedyu has commented on the revision HBASE-5441 [jira] HRegionThriftServer may not start because of a race-condition. Patch looks good. Will integrate after unit tests pass. REVISION DETAIL https://reviews.facebook.net/D1857 BRANCH hb-5441 HRegionThriftServer may not start because of a race-condition - Key: HBASE-5441 URL: https://issues.apache.org/jira/browse/HBASE-5441 Project: HBase Issue Type: Bug Components: thrift Reporter: Scott Chen Assignee: Scott Chen Priority: Minor Attachments: HBASE-5441.D1845.1.patch, HBASE-5441.D1845.2.patch, HBASE-5441.D1857.2.patch, HBASE-5441.D1857.3.patch, HBASE-5441.D1857.4.patch This happens because the master is not started when ThriftServerRunner tries to create an HBaseAdmin. org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1333) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:899) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy8.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:649) at org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:108) at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.init(ThriftServerRunner.java:516) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer$HBaseHandlerRegion.init(HRegionThriftServer.java:104) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer.init(HRegionThriftServer.java:74) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeThreads(HRegionServer.java:646) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:546) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:658) at java.lang.Thread.run(Thread.java:662) 2012-02-21 16:38:18,223 INFO org.apache.hadoop.hba -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5407) Show the per-region level request/sec count in the web ui
[ https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213973#comment-13213973 ] Phabricator commented on HBASE-5407: khemani has accepted the revision [jira][HBASE-5407][89-fb] Show the per-region level request/sec count in the web ui. REVISION DETAIL https://reviews.facebook.net/D1779 BRANCH regionRequest Show the per-region level request/sec count in the web ui - Key: HBASE-5407 URL: https://issues.apache.org/jira/browse/HBASE-5407 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1779.1.patch, D1779.1.patch, D1779.1.patch It would be nice to show the per-region level request/sec count in the web ui, especially when debugging the hot region problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5407) Show the per-region level request/sec count in the web ui
[ https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213975#comment-13213975 ] Phabricator commented on HBASE-5407: khemani has accepted the revision [jira][HBASE-5407][89-fb] Show the per-region level request/sec count in the web ui. REVISION DETAIL https://reviews.facebook.net/D1779 BRANCH regionRequest Show the per-region level request/sec count in the web ui - Key: HBASE-5407 URL: https://issues.apache.org/jira/browse/HBASE-5407 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1779.1.patch, D1779.1.patch, D1779.1.patch It would be nice to show the per-region level request/sec count in the web ui, especially when debugging the hot region problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5407) Show the per-region level request/sec count in the web ui
[ https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213974#comment-13213974 ] Phabricator commented on HBASE-5407: khemani has accepted the revision [jira][HBASE-5407][89-fb] Show the per-region level request/sec count in the web ui. REVISION DETAIL https://reviews.facebook.net/D1779 BRANCH regionRequest Show the per-region level request/sec count in the web ui - Key: HBASE-5407 URL: https://issues.apache.org/jira/browse/HBASE-5407 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1779.1.patch, D1779.1.patch, D1779.1.patch It would be nice to show the per-region level request/sec count in the web ui, especially when debugging the hot region problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5407) Show the per-region level request/sec count in the web ui
[ https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213972#comment-13213972 ] Phabricator commented on HBASE-5407: khemani has accepted the revision [jira][HBASE-5407][89-fb] Show the per-region level request/sec count in the web ui. REVISION DETAIL https://reviews.facebook.net/D1779 BRANCH regionRequest Show the per-region level request/sec count in the web ui - Key: HBASE-5407 URL: https://issues.apache.org/jira/browse/HBASE-5407 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1779.1.patch, D1779.1.patch, D1779.1.patch It would be nice to show the per-region level request/sec count in the web ui, especially when debugging the hot region problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5441) HRegionThriftServer may not start because of a race-condition
[ https://issues.apache.org/jira/browse/HBASE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213980#comment-13213980 ] Hadoop QA commented on HBASE-5441: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515639/HBASE-5441.D1857.4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 151 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1013//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1013//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1013//console This message is automatically generated. HRegionThriftServer may not start because of a race-condition - Key: HBASE-5441 URL: https://issues.apache.org/jira/browse/HBASE-5441 Project: HBase Issue Type: Bug Components: thrift Reporter: Scott Chen Assignee: Scott Chen Priority: Minor Attachments: HBASE-5441.D1845.1.patch, HBASE-5441.D1845.2.patch, HBASE-5441.D1857.2.patch, HBASE-5441.D1857.3.patch, HBASE-5441.D1857.4.patch This happens because the master is not started when ThriftServerRunner tries to create an HBaseAdmin. org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1333) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:899) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy8.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:649) at org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:108) at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.init(ThriftServerRunner.java:516) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer$HBaseHandlerRegion.init(HRegionThriftServer.java:104) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer.init(HRegionThriftServer.java:74) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeThreads(HRegionServer.java:646) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:546) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:658) at java.lang.Thread.run(Thread.java:662) 2012-02-21 16:38:18,223 INFO org.apache.hadoop.hba -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5456) Introduce PowerMock into our unit tests to reduce unnecessary method exposure
Introduce PowerMock into our unit tests to reduce unnecessary method exposure - Key: HBASE-5456 URL: https://issues.apache.org/jira/browse/HBASE-5456 Project: HBase Issue Type: Task Reporter: Zhihong Yu We should introduce PowerMock into our unit tests so that we don't have to expose methods intended to be used by unit tests. Here was Benoit's reply to a user of asynchbase about testability: OpenTSDB has unit tests that are mocking out HBaseClient just fine [1]. You can mock out pretty much anything on the JVM: final, private, JDK stuff, etc. All you need is the right tools. I've been very happy with PowerMock. It supports Mockito and EasyMock. I've never been keen on mutilating public interfaces for the sake of testing. With tools like PowerMock, we can keep the public APIs tidy while mocking and overriding anything, even in the most private guts of the classes. [1] https://github.com/stumbleupon/opentsdb/blob/master/src/uid/TestUniqueId.java#L66 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5332) Deterministic Compaction Jitter
[ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213987#comment-13213987 ] Nicolas Spiegelberg commented on HBASE-5332: Patch works for trunk as well. Needed to fix a single conflict with Store.FIXED_OVERHEAD. Deterministic Compaction Jitter --- Key: HBASE-5332 URL: https://issues.apache.org/jira/browse/HBASE-5332 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Minor Attachments: D1785.1.patch, D1785.2.patch, D1785.3.patch Currently, we add jitter to a compaction using delay + jitter*(1 - 2*Math.random()). Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to delay + jitter will now be set to delay - jitter. We need a more deterministic way to jitter major compactions so this information can persist across server restarts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5441) HRegionThriftServer may not start because of a race-condition
[ https://issues.apache.org/jira/browse/HBASE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213985#comment-13213985 ] Zhihong Yu commented on HBASE-5441: --- Latest test run is clean - no test regression. Will integrate the patch tomorrow if I don't hear objection. HRegionThriftServer may not start because of a race-condition - Key: HBASE-5441 URL: https://issues.apache.org/jira/browse/HBASE-5441 Project: HBase Issue Type: Bug Components: thrift Reporter: Scott Chen Assignee: Scott Chen Priority: Minor Attachments: HBASE-5441.D1845.1.patch, HBASE-5441.D1845.2.patch, HBASE-5441.D1857.2.patch, HBASE-5441.D1857.3.patch, HBASE-5441.D1857.4.patch This happens because the master is not started when ThriftServerRunner tries to create an HBaseAdmin. org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1333) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:899) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy8.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:649) at org.apache.hadoop.hbase.client.HBaseAdmin.init(HBaseAdmin.java:108) at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.init(ThriftServerRunner.java:516) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer$HBaseHandlerRegion.init(HRegionThriftServer.java:104) at org.apache.hadoop.hbase.regionserver.HRegionThriftServer.init(HRegionThriftServer.java:74) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeThreads(HRegionServer.java:646) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:546) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:658) at java.lang.Thread.run(Thread.java:662) 2012-02-21 16:38:18,223 INFO org.apache.hadoop.hba -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5456) Introduce PowerMock into our unit tests to reduce unnecessary method exposure
[ https://issues.apache.org/jira/browse/HBASE-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213990#comment-13213990 ] Mikhail Bautin commented on HBASE-5456: --- I think this only makes sense for HBase if people start running all unit tests for every patch (not just small and medium tests). These advanced reflection features convert some frequent types of errors from compile-time to test-time. Also, a lot of IDE search and refactoring features will be broken. Introduce PowerMock into our unit tests to reduce unnecessary method exposure - Key: HBASE-5456 URL: https://issues.apache.org/jira/browse/HBASE-5456 Project: HBase Issue Type: Task Reporter: Zhihong Yu We should introduce PowerMock into our unit tests so that we don't have to expose methods intended to be used by unit tests. Here was Benoit's reply to a user of asynchbase about testability: OpenTSDB has unit tests that are mocking out HBaseClient just fine [1]. You can mock out pretty much anything on the JVM: final, private, JDK stuff, etc. All you need is the right tools. I've been very happy with PowerMock. It supports Mockito and EasyMock. I've never been keen on mutilating public interfaces for the sake of testing. With tools like PowerMock, we can keep the public APIs tidy while mocking and overriding anything, even in the most private guts of the classes. [1] https://github.com/stumbleupon/opentsdb/blob/master/src/uid/TestUniqueId.java#L66 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5332) Deterministic Compaction Jitter
[ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213994#comment-13213994 ] Phabricator commented on HBASE-5332: nspiegelberg has committed the revision [jira] [HBASE-5332] Deterministic Compaction Jitter. REVISION DETAIL https://reviews.facebook.net/D1785 COMMIT https://reviews.facebook.net/rHBASE1292495 Deterministic Compaction Jitter --- Key: HBASE-5332 URL: https://issues.apache.org/jira/browse/HBASE-5332 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Minor Attachments: D1785.1.patch, D1785.2.patch, D1785.3.patch Currently, we add jitter to a compaction using delay + jitter*(1 - 2*Math.random()). Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to delay + jitter will now be set to delay - jitter. We need a more deterministic way to jitter major compactions so this information can persist across server restarts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5332) Deterministic Compaction Jitter
[ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Spiegelberg updated HBASE-5332: --- Fix Version/s: 0.94.0 Status: Patch Available (was: Open) Deterministic Compaction Jitter --- Key: HBASE-5332 URL: https://issues.apache.org/jira/browse/HBASE-5332 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Minor Fix For: 0.94.0 Attachments: D1785.1.patch, D1785.2.patch, D1785.3.patch Currently, we add jitter to a compaction using delay + jitter*(1 - 2*Math.random()). Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to delay + jitter will now be set to delay - jitter. We need a more deterministic way to jitter major compactions so this information can persist across server restarts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5332) Deterministic Compaction Jitter
[ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Spiegelberg updated HBASE-5332: --- Status: Open (was: Patch Available) Deterministic Compaction Jitter --- Key: HBASE-5332 URL: https://issues.apache.org/jira/browse/HBASE-5332 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Minor Fix For: 0.94.0 Attachments: D1785.1.patch, D1785.2.patch, D1785.3.patch, HBASE-5332.patch Currently, we add jitter to a compaction using delay + jitter*(1 - 2*Math.random()). Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to delay + jitter will now be set to delay - jitter. We need a more deterministic way to jitter major compactions so this information can persist across server restarts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5332) Deterministic Compaction Jitter
[ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Spiegelberg updated HBASE-5332: --- Attachment: HBASE-5332.patch trunk patch Deterministic Compaction Jitter --- Key: HBASE-5332 URL: https://issues.apache.org/jira/browse/HBASE-5332 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Minor Fix For: 0.94.0 Attachments: D1785.1.patch, D1785.2.patch, D1785.3.patch, HBASE-5332.patch Currently, we add jitter to a compaction using delay + jitter*(1 - 2*Math.random()). Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to delay + jitter will now be set to delay - jitter. We need a more deterministic way to jitter major compactions so this information can persist across server restarts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5332) Deterministic Compaction Jitter
[ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Spiegelberg updated HBASE-5332: --- Status: Patch Available (was: Open) resubmitting after creating trunk-specific patch file Deterministic Compaction Jitter --- Key: HBASE-5332 URL: https://issues.apache.org/jira/browse/HBASE-5332 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Minor Fix For: 0.94.0 Attachments: D1785.1.patch, D1785.2.patch, D1785.3.patch, HBASE-5332.patch Currently, we add jitter to a compaction using delay + jitter*(1 - 2*Math.random()). Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to delay + jitter will now be set to delay - jitter. We need a more deterministic way to jitter major compactions so this information can persist across server restarts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5332) Deterministic Compaction Jitter
[ https://issues.apache.org/jira/browse/HBASE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Spiegelberg updated HBASE-5332: --- Resolution: Fixed Status: Resolved (was: Patch Available) Deterministic Compaction Jitter --- Key: HBASE-5332 URL: https://issues.apache.org/jira/browse/HBASE-5332 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Minor Fix For: 0.94.0 Attachments: D1785.1.patch, D1785.2.patch, D1785.3.patch, HBASE-5332.patch Currently, we add jitter to a compaction using delay + jitter*(1 - 2*Math.random()). Since this is non-deterministic, we can get major compaction storms on server restart as half the Stores that were set to delay + jitter will now be set to delay - jitter. We need a more deterministic way to jitter major compactions so this information can persist across server restarts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214002#comment-13214002 ] Hadoop QA commented on HBASE-5074: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515642/D1521.8.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 55 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1014//console This message is automatically generated. support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5456) Introduce PowerMock into our unit tests to reduce unnecessary method exposure
[ https://issues.apache.org/jira/browse/HBASE-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214006#comment-13214006 ] Zhihong Yu commented on HBASE-5456: --- My understanding is that Hadoop QA does run all tests. From https://builds.apache.org/job/PreCommit-HBASE-Build/1013/console: {code} Tests run: 885, Failures: 6, Errors: 1, Skipped: 10 [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 37:16.809s {code} From https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2665/console: {code} Tests run: 885, Failures: 0, Errors: 3, Skipped: 10 [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 48:14.023s {code} Introduce PowerMock into our unit tests to reduce unnecessary method exposure - Key: HBASE-5456 URL: https://issues.apache.org/jira/browse/HBASE-5456 Project: HBase Issue Type: Task Reporter: Zhihong Yu We should introduce PowerMock into our unit tests so that we don't have to expose methods intended to be used by unit tests. Here was Benoit's reply to a user of asynchbase about testability: OpenTSDB has unit tests that are mocking out HBaseClient just fine [1]. You can mock out pretty much anything on the JVM: final, private, JDK stuff, etc. All you need is the right tools. I've been very happy with PowerMock. It supports Mockito and EasyMock. I've never been keen on mutilating public interfaces for the sake of testing. With tools like PowerMock, we can keep the public APIs tidy while mocking and overriding anything, even in the most private guts of the classes. [1] https://github.com/stumbleupon/opentsdb/blob/master/src/uid/TestUniqueId.java#L66 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5279) NPE in Master after upgrading to 0.92.0
[ https://issues.apache.org/jira/browse/HBASE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu reassigned HBASE-5279: - Assignee: Tobias Herbert NPE in Master after upgrading to 0.92.0 --- Key: HBASE-5279 URL: https://issues.apache.org/jira/browse/HBASE-5279 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0 Reporter: Tobias Herbert Assignee: Tobias Herbert Priority: Critical Fix For: 0.92.1 Attachments: HBASE-5279-v2.patch, HBASE-5279.patch I have upgraded my environment from 0.90.4 to 0.92.0 after the table migration I get the following error in the master (permanent) {noformat} 2012-01-25 18:23:48,648 FATAL master-namenode,6,1327512209588 org.apache.hadoop.hbase.master.HMaster - Unhandled exception. Starting shutdown. java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:2190) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:323) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:501) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326) at java.lang.Thread.run(Thread.java:662) 2012-01-25 18:23:48,650 INFO namenode,6,1327512209588 org.apache.hadoop.hbase.master.HMaster - Aborting {noformat} I think that's because I had a hard crash in the cluster a while ago - and the following WARN since then {noformat} 2012-01-25 21:20:47,121 WARN namenode,6,1327513078123-CatalogJanitor org.apache.hadoop.hbase.master.CatalogJanitor - REGIONINFO_QUALIFIER is empty in keyvalues={emails,,xxx./info:server/1314336400471/Put/vlen=38, emails,,1314189353300.xxx./info:serverstartcode/1314336400471/Put/vlen=8} {noformat} my patch was simple to go around the NPE (as the other code around the lines) but I don't know if that's correct -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5279) NPE in Master after upgrading to 0.92.0
[ https://issues.apache.org/jira/browse/HBASE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5279: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) NPE in Master after upgrading to 0.92.0 --- Key: HBASE-5279 URL: https://issues.apache.org/jira/browse/HBASE-5279 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0 Reporter: Tobias Herbert Assignee: Tobias Herbert Priority: Critical Fix For: 0.92.1 Attachments: HBASE-5279-v2.patch, HBASE-5279.patch I have upgraded my environment from 0.90.4 to 0.92.0 after the table migration I get the following error in the master (permanent) {noformat} 2012-01-25 18:23:48,648 FATAL master-namenode,6,1327512209588 org.apache.hadoop.hbase.master.HMaster - Unhandled exception. Starting shutdown. java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:2190) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:323) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:501) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326) at java.lang.Thread.run(Thread.java:662) 2012-01-25 18:23:48,650 INFO namenode,6,1327512209588 org.apache.hadoop.hbase.master.HMaster - Aborting {noformat} I think that's because I had a hard crash in the cluster a while ago - and the following WARN since then {noformat} 2012-01-25 21:20:47,121 WARN namenode,6,1327513078123-CatalogJanitor org.apache.hadoop.hbase.master.CatalogJanitor - REGIONINFO_QUALIFIER is empty in keyvalues={emails,,xxx./info:server/1314336400471/Put/vlen=38, emails,,1314189353300.xxx./info:serverstartcode/1314336400471/Put/vlen=8} {noformat} my patch was simple to go around the NPE (as the other code around the lines) but I don't know if that's correct -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5456) Introduce PowerMock into our unit tests to reduce unnecessary method exposure
[ https://issues.apache.org/jira/browse/HBASE-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214013#comment-13214013 ] Zhihong Yu commented on HBASE-5456: --- From Ted Dunning: Actually jmockit uses byte code patching so you may suffer less reflection overhead than expected. My guess is that powermock is doing something quite similar. Introduce PowerMock into our unit tests to reduce unnecessary method exposure - Key: HBASE-5456 URL: https://issues.apache.org/jira/browse/HBASE-5456 Project: HBase Issue Type: Task Reporter: Zhihong Yu We should introduce PowerMock into our unit tests so that we don't have to expose methods intended to be used by unit tests. Here was Benoit's reply to a user of asynchbase about testability: OpenTSDB has unit tests that are mocking out HBaseClient just fine [1]. You can mock out pretty much anything on the JVM: final, private, JDK stuff, etc. All you need is the right tools. I've been very happy with PowerMock. It supports Mockito and EasyMock. I've never been keen on mutilating public interfaces for the sake of testing. With tools like PowerMock, we can keep the public APIs tidy while mocking and overriding anything, even in the most private guts of the classes. [1] https://github.com/stumbleupon/opentsdb/blob/master/src/uid/TestUniqueId.java#L66 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5425) Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's EnableTableHandler)
[ https://issues.apache.org/jira/browse/HBASE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu reassigned HBASE-5425: - Assignee: terry zhang Punt on the timeout doesn't work in BulkEnabler#waitUntilDone (master's EnableTableHandler) Key: HBASE-5425 URL: https://issues.apache.org/jira/browse/HBASE-5425 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.5, 0.92.0 Reporter: terry zhang Assignee: terry zhang Fix For: 0.94.0 Attachments: HBASE-5425.patch please take a look at the code below in EnableTableHandler(hbase master): {code:title=EnableTableHandler.java|borderStyle=solid} protected boolean waitUntilDone(long timeout) throws InterruptedException { . int lastNumberOfRegions = this.countOfRegionsInTable; while (!server.isStopped() remaining 0) { Thread.sleep(waitingTimeForEvents); regions = assignmentManager.getRegionsOfTable(tableName); if (isDone(regions)) break; // Punt on the timeout as long we make progress if (regions.size() lastNumberOfRegions) { lastNumberOfRegions = regions.size(); timeout += waitingTimeForEvents; } remaining = timeout - (System.currentTimeMillis() - startTime); } private boolean isDone(final ListHRegionInfo regions) { return regions != null regions.size() = this.countOfRegionsInTable; } {code} We can easily find out if we let lastNumberOfRegions = this.countOfRegionsInTable , the function of punt on timeout code will never be executed. I think initlize lastNumberOfRegions = 0 can make it work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23
[ https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214025#comment-13214025 ] Gregory Chanan commented on HBASE-5317: --- @Ted: Checked it on a mac, seeing the same failure as you. Can you confirm something for me? If you cd to target/org.apache.hadoop.mapred.MiniMRCluster and run the following commands, do you see something similar? $ find . -name stderr ./org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_2/application_1329939539132_0001/container_1329939539132_0001_01_01/stderr find . -name syslog $ $ cat ./org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_2/application_1329939539132_0001/container_1329939539132_0001_01_01/stderr /bin/bash: /bin/java: No such file or directory Fix TestHFileOutputFormat to work against hadoop 0.23 - Key: HBASE-5317 URL: https://issues.apache.org/jira/browse/HBASE-5317 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.92.0, 0.94.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch, HBASE-5317-v3.patch, HBASE-5317-v4.patch, HBASE-5317-v5.patch, HBASE-5317-v6.patch, TEST-org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.xml Running mvn -Dhadoop.profile=23 test -P localTests -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat yields this on 0.92: Failed tests: testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): HFile for column family info-A not found Tests in error: test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0 (Is a directory) testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable It looks like on trunk, this also results in an error: testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but haven't fixed the other 3 yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5456) Introduce PowerMock into our unit tests to reduce unnecessary method exposure
[ https://issues.apache.org/jira/browse/HBASE-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214029#comment-13214029 ] Todd Lipcon commented on HBASE-5456: I tend to agree with Mikhail. The presence of a protected method named getRegionServerServicesForTests or something lets me know, when I'm working on the code, that this method is used, and I can easily use eclipse to tell me which unit tests use it. PowerMock and other tools which use strings to refer to functions aren't going to play nice with that, so it's easy to be unaware of test dependencies. I think these tools are best used sparingly and only to mock out system dependencies (like new FileInputStream(), InetSocketAddress.getHostName(), or System.currentTimeMillis()) Introduce PowerMock into our unit tests to reduce unnecessary method exposure - Key: HBASE-5456 URL: https://issues.apache.org/jira/browse/HBASE-5456 Project: HBase Issue Type: Task Reporter: Zhihong Yu We should introduce PowerMock into our unit tests so that we don't have to expose methods intended to be used by unit tests. Here was Benoit's reply to a user of asynchbase about testability: OpenTSDB has unit tests that are mocking out HBaseClient just fine [1]. You can mock out pretty much anything on the JVM: final, private, JDK stuff, etc. All you need is the right tools. I've been very happy with PowerMock. It supports Mockito and EasyMock. I've never been keen on mutilating public interfaces for the sake of testing. With tools like PowerMock, we can keep the public APIs tidy while mocking and overriding anything, even in the most private guts of the classes. [1] https://github.com/stumbleupon/opentsdb/blob/master/src/uid/TestUniqueId.java#L66 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression
[ https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214032#comment-13214032 ] He Yongqiang commented on HBASE-5313: - As a first step, we will go ahead with a simple columnar layout implementation. And leave more advanced features (like nested column layout) in a follow up. Restructure hfiles layout for better compression Key: HBASE-5313 URL: https://issues.apache.org/jira/browse/HBASE-5313 Project: HBase Issue Type: Improvement Components: io Reporter: dhruba borthakur Assignee: dhruba borthakur A HFile block contain a stream of key-values. Can we can organize these kvs on the disk in a better way so that we get much greater compression ratios? One option (thanks Prakash) is to store all the keys in the beginning of the block (let's call this the key-section) and then store all their corresponding values towards the end of the block. This will allow us to not-even decompress the values when we are scanning and skipping over rows in the block. Any other ideas? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5457) add inline index in data block for data which are not clustered together
add inline index in data block for data which are not clustered together Key: HBASE-5457 URL: https://issues.apache.org/jira/browse/HBASE-5457 Project: HBase Issue Type: New Feature Reporter: He Yongqiang As we are go through our data schema, and we found we have one large column family which is just duplicating data from another column family and is just a re-org of the data to cluster data in a different way than the original column family in order to serve another type of queries efficiently. If we compare this second column family with similar situation in mysql, it is like an index in mysql. So if we can add inline block index on required columns, the second column family then is not needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23
[ https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214037#comment-13214037 ] Zhihong Yu commented on HBASE-5317: --- I do see similar output: {code} LM-SJN-00713032:org.apache.hadoop.mapred.MiniMRCluster zhihyu$ find . -name stderr ./org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1329947593635_0001/container_1329947593635_0001_01_01/stderr LM-SJN-00713032:org.apache.hadoop.mapred.MiniMRCluster zhihyu$ find . -name syslog LM-SJN-00713032:org.apache.hadoop.mapred.MiniMRCluster zhihyu$ cat org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1329947593635_0001/container_1329947593635_0001_01_01/stderr /bin/bash: /bin/java: No such file or directory {code} Fix TestHFileOutputFormat to work against hadoop 0.23 - Key: HBASE-5317 URL: https://issues.apache.org/jira/browse/HBASE-5317 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.92.0, 0.94.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-5317-v0.patch, HBASE-5317-v1.patch, HBASE-5317-v3.patch, HBASE-5317-v4.patch, HBASE-5317-v5.patch, HBASE-5317-v6.patch, TEST-org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.xml Running mvn -Dhadoop.profile=23 test -P localTests -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat yields this on 0.92: Failed tests: testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): HFile for column family info-A not found Tests in error: test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0 (Is a directory) testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable It looks like on trunk, this also results in an error: testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): TestTable I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but haven't fixed the other 3 yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5456) Introduce PowerMock into our unit tests to reduce unnecessary method exposure
[ https://issues.apache.org/jira/browse/HBASE-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214039#comment-13214039 ] Zhihong Yu commented on HBASE-5456: --- If the following methods can be made protected or package private, that would be some progress: {code} public MapDataBlockEncoding, Integer getEncodingCountsForTest() { src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java public static SchemaMetrics getUnknownInstanceForTest() { src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java public void compactRecentForTesting(int N) throws IOException { src/main/java/org/apache/hadoop/hbase/regionserver/Store.java public static User createUserForTesting(Configuration conf, public static User createUserForTesting(Configuration conf, public static User createUserForTesting(Configuration conf, src/main/java/org/apache/hadoop/hbase/security/User.java public long getNumQueriesForTesting(int chunk) { public long getNumPositivesForTesting(int chunk) { src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java {code} Introduce PowerMock into our unit tests to reduce unnecessary method exposure - Key: HBASE-5456 URL: https://issues.apache.org/jira/browse/HBASE-5456 Project: HBase Issue Type: Task Reporter: Zhihong Yu We should introduce PowerMock into our unit tests so that we don't have to expose methods intended to be used by unit tests. Here was Benoit's reply to a user of asynchbase about testability: OpenTSDB has unit tests that are mocking out HBaseClient just fine [1]. You can mock out pretty much anything on the JVM: final, private, JDK stuff, etc. All you need is the right tools. I've been very happy with PowerMock. It supports Mockito and EasyMock. I've never been keen on mutilating public interfaces for the sake of testing. With tools like PowerMock, we can keep the public APIs tidy while mocking and overriding anything, even in the most private guts of the classes. [1] https://github.com/stumbleupon/opentsdb/blob/master/src/uid/TestUniqueId.java#L66 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5458) Thread safety issues with Compression.Algorithm.GZ and CompressionTest
Thread safety issues with Compression.Algorithm.GZ and CompressionTest -- Key: HBASE-5458 URL: https://issues.apache.org/jira/browse/HBASE-5458 Project: HBase Issue Type: Bug Components: io Affects Versions: 0.90.5 Reporter: David McIntosh Priority: Minor I've seen some occasional NullPointerExceptions in ZlibFactory.isNativeZlibLoaded(conf) during region server startups and the completebulkload process. This is being caused by a null configuration getting passed to the isNativeZlibLoaded method. I think this happens when 2 or more threads call the CompressionTest.testCompression method at once. If the GZ algorithm has not been tested yet both threads could continue on and attempt to load the compressor. For GZ the getCodec method is not thread safe which could lead to one thread getting a reference to a GzipCodec that has a null configuration. current: DefaultCodec getCodec(Configuration conf) { if (codec == null) { codec = new GzipCodec(); codec.setConf(new Configuration(conf)); } return codec; } one possible fix would be something like this: DefaultCodec getCodec(Configuration conf) { if (codec == null) { GzipCodec gzip = new GzipCodec(); gzip.setConf(new Configuration(conf)); codec = gzip; } return codec; } But that may not be totally safe without some synchronization. An upstream fix in CompressionTest could also prevent multi thread access to GZ.getCodec(conf) exceptions: 12/02/21 16:11:56 ERROR handler.OpenRegionHandler: Failed open of region=all-monthly,,1326263896983.bf574519a95263ec23a2bad9f5b8cbf4. java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:89) at org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs(HRegion.java:2670) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2659) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2647) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:312) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:99) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException at org.apache.hadoop.io.compress.zlib.ZlibFactory.isNativeZlibLoaded(ZlibFactory.java:63) at org.apache.hadoop.io.compress.GzipCodec.getCompressorType(GzipCodec.java:166) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:100) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112) at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236) at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:84) ... 9 more Caused by: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:89) at org.apache.hadoop.hbase.io.hfile.HFile$Reader.readTrailer(HFile.java:890) at org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:819) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:405) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:323) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:321) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException at org.apache.hadoop.io.compress.zlib.ZlibFactory.isNativeZlibLoaded(ZlibFactory.java:63) at org.apache.hadoop.io.compress.GzipCodec.getCompressorType(GzipCodec.java:166) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:100) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112) at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
[jira] [Commented] (HBASE-5434) [REST] Include more metrics in cluster status request
[ https://issues.apache.org/jira/browse/HBASE-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214041#comment-13214041 ] Andrew Purtell commented on HBASE-5434: --- +1 Looks good Mubarak. [REST] Include more metrics in cluster status request - Key: HBASE-5434 URL: https://issues.apache.org/jira/browse/HBASE-5434 Project: HBase Issue Type: Improvement Components: metrics, rest Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Mubarak Seyed Priority: Minor Labels: noob Fix For: 0.94.0 Attachments: HBASE-5434.trunk.v1.patch /status/cluster shows only {code} stores=2 storefiless=0 storefileSizeMB=0 memstoreSizeMB=0 storefileIndexSizeMB=0 {code} for a region but master web-ui shows {code} stores=1, storefiles=0, storefileUncompressedSizeMB=0 storefileSizeMB=0 memstoreSizeMB=0 storefileIndexSizeMB=0 readRequestsCount=0 writeRequestsCount=0 rootIndexSizeKB=0 totalStaticIndexSizeKB=0 totalStaticBloomSizeKB=0 totalCompactingKVs=0 currentCompactedKVs=0 compactionProgressPct=NaN {code} In a write-heavy REST gateway based production environment, ops team needs to verify whether write counters are getting incremented per region (they do run /status/cluster on each REST server), we can get the same values from *rpc.metrics.put_num_ops* and *hbase.regionserver.writeRequestsCount* but some home-grown tools needs to parse the output of /status/cluster and updates the dashboard. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5434) [REST] Include more metrics in cluster status request
[ https://issues.apache.org/jira/browse/HBASE-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214051#comment-13214051 ] Mubarak Seyed commented on HBASE-5434: -- Thanks Andrew. @Ted. Can you please take care of updating the wiki page once it is committed as i don't have an access? Thanks. [REST] Include more metrics in cluster status request - Key: HBASE-5434 URL: https://issues.apache.org/jira/browse/HBASE-5434 Project: HBase Issue Type: Improvement Components: metrics, rest Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Mubarak Seyed Priority: Minor Labels: noob Fix For: 0.94.0 Attachments: HBASE-5434.trunk.v1.patch /status/cluster shows only {code} stores=2 storefiless=0 storefileSizeMB=0 memstoreSizeMB=0 storefileIndexSizeMB=0 {code} for a region but master web-ui shows {code} stores=1, storefiles=0, storefileUncompressedSizeMB=0 storefileSizeMB=0 memstoreSizeMB=0 storefileIndexSizeMB=0 readRequestsCount=0 writeRequestsCount=0 rootIndexSizeKB=0 totalStaticIndexSizeKB=0 totalStaticBloomSizeKB=0 totalCompactingKVs=0 currentCompactedKVs=0 compactionProgressPct=NaN {code} In a write-heavy REST gateway based production environment, ops team needs to verify whether write counters are getting incremented per region (they do run /status/cluster on each REST server), we can get the same values from *rpc.metrics.put_num_ops* and *hbase.regionserver.writeRequestsCount* but some home-grown tools needs to parse the output of /status/cluster and updates the dashboard. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5458) Thread safety issues with Compression.Algorithm.GZ and CompressionTest
[ https://issues.apache.org/jira/browse/HBASE-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5458: -- Description: I've seen some occasional NullPointerExceptions in ZlibFactory.isNativeZlibLoaded(conf) during region server startups and the completebulkload process. This is being caused by a null configuration getting passed to the isNativeZlibLoaded method. I think this happens when 2 or more threads call the CompressionTest.testCompression method at once. If the GZ algorithm has not been tested yet both threads could continue on and attempt to load the compressor. For GZ the getCodec method is not thread safe which could lead to one thread getting a reference to a GzipCodec that has a null configuration. {code} current: DefaultCodec getCodec(Configuration conf) { if (codec == null) { codec = new GzipCodec(); codec.setConf(new Configuration(conf)); } return codec; } {code} one possible fix would be something like this: {code} DefaultCodec getCodec(Configuration conf) { if (codec == null) { GzipCodec gzip = new GzipCodec(); gzip.setConf(new Configuration(conf)); codec = gzip; } return codec; } {code} But that may not be totally safe without some synchronization. An upstream fix in CompressionTest could also prevent multi thread access to GZ.getCodec(conf) exceptions: 12/02/21 16:11:56 ERROR handler.OpenRegionHandler: Failed open of region=all-monthly,,1326263896983.bf574519a95263ec23a2bad9f5b8cbf4. java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:89) at org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs(HRegion.java:2670) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2659) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2647) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:312) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:99) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException at org.apache.hadoop.io.compress.zlib.ZlibFactory.isNativeZlibLoaded(ZlibFactory.java:63) at org.apache.hadoop.io.compress.GzipCodec.getCompressorType(GzipCodec.java:166) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:100) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112) at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236) at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:84) ... 9 more Caused by: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:89) at org.apache.hadoop.hbase.io.hfile.HFile$Reader.readTrailer(HFile.java:890) at org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:819) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:405) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:323) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:321) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException at org.apache.hadoop.io.compress.zlib.ZlibFactory.isNativeZlibLoaded(ZlibFactory.java:63) at org.apache.hadoop.io.compress.GzipCodec.getCompressorType(GzipCodec.java:166) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:100) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112) at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236) at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:84) ... 10 more was: I've seen some occasional NullPointerExceptions in
[jira] [Commented] (HBASE-5458) Thread safety issues with Compression.Algorithm.GZ and CompressionTest
[ https://issues.apache.org/jira/browse/HBASE-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214068#comment-13214068 ] Zhihong Yu commented on HBASE-5458: --- {code} try { Compressor c = algo.getCompressor(); algo.returnCompressor(c); compressionTestResults[algo.ordinal()] = true; // passes } catch (Throwable t) { compressionTestResults[algo.ordinal()] = false; // failure throw new IOException(t); } {code} How about catching NPE in the above code and call algo.getCompressor() again ? We set compressionTestResults to false after the retry fails. Thread safety issues with Compression.Algorithm.GZ and CompressionTest -- Key: HBASE-5458 URL: https://issues.apache.org/jira/browse/HBASE-5458 Project: HBase Issue Type: Bug Components: io Affects Versions: 0.90.5 Reporter: David McIntosh Priority: Minor I've seen some occasional NullPointerExceptions in ZlibFactory.isNativeZlibLoaded(conf) during region server startups and the completebulkload process. This is being caused by a null configuration getting passed to the isNativeZlibLoaded method. I think this happens when 2 or more threads call the CompressionTest.testCompression method at once. If the GZ algorithm has not been tested yet both threads could continue on and attempt to load the compressor. For GZ the getCodec method is not thread safe which could lead to one thread getting a reference to a GzipCodec that has a null configuration. {code} current: DefaultCodec getCodec(Configuration conf) { if (codec == null) { codec = new GzipCodec(); codec.setConf(new Configuration(conf)); } return codec; } {code} one possible fix would be something like this: {code} DefaultCodec getCodec(Configuration conf) { if (codec == null) { GzipCodec gzip = new GzipCodec(); gzip.setConf(new Configuration(conf)); codec = gzip; } return codec; } {code} But that may not be totally safe without some synchronization. An upstream fix in CompressionTest could also prevent multi thread access to GZ.getCodec(conf) exceptions: 12/02/21 16:11:56 ERROR handler.OpenRegionHandler: Failed open of region=all-monthly,,1326263896983.bf574519a95263ec23a2bad9f5b8cbf4. java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:89) at org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs(HRegion.java:2670) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2659) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2647) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:312) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:99) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException at org.apache.hadoop.io.compress.zlib.ZlibFactory.isNativeZlibLoaded(ZlibFactory.java:63) at org.apache.hadoop.io.compress.GzipCodec.getCompressorType(GzipCodec.java:166) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:100) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112) at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236) at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:84) ... 9 more Caused by: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:89) at org.apache.hadoop.hbase.io.hfile.HFile$Reader.readTrailer(HFile.java:890) at org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:819) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:405) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:323) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:321) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214071#comment-13214071 ] Mubarak Seyed commented on HBASE-4991: -- @Stack Thanks for the review bq.Do we need to add this method to the region server interface? bq.{code} + public int getRegionsCount(byte[] regionName) throws IOException; {code} bq.Can we not just count what comes back from the get on online regions? We need to get the counts per table, _getOnlineRegions_ returns _ListHRegion_ for a table but client does not know the _tableName_ for a _regionName_ in our case, either we can do two calls (one for getting _HRegionInfo_ and get _tableName_ from there and next one for getting _ListHRegion_). I thought we can simplify by adding a new interface. {code} public int getRegionsCount(byte[] regionName) throws IOException { return getOnlineRegions(getRegionInfo(regionName).getTableName()) .size(); } {code} bq. Do we have to run the region delete in the Master process? Can the client not do it? Design choice is like [HBASE-4213|https://issues.apache.org/jira/browse/HBASE-4213], meaning master create a znode under _zookeeper.znode.parent/delete-region_ then RS trackers are getting notified of children changed, then a RS which hosts the region to-be-deleted will process the delete-region request and update the state in _zookeeper.znode.parent/delete-region/encoded-region-name-to-be-deleted_ znode. bq.Is it really necessary adding + public MasterDeleteRegionTracker getDeleteRegionTracker(); to the MasterServices? This will have a ripple effect through Tests and it seems like a bit of an exotic API to have in this basic Interface. Will think a bit more and update you bq.Does all of this new code need to be in HRegionServer? Can it live in a class of its own? Like to hear the comments from code review, we can refactor to helper class. bq. There must be a million holes here (HRS crashes in middle of file moving or creation of the merged region, files partially moved or deleted). I believe _delete-region_ state in ZK will help to recover from failures, need more testcases with individual failure scenarios such as HRS crash, failure of merged region, failure of file remove in HDFS, failure of new region directory creation in HDFS, partial files, etc, will add them when i do stress test for Todd's suggestion bq. Does this code all need to be in core? Can we not make a few primitives and then run it all from outside in a tool or script w/ state recorded as we go so can resume if fail mid-way? There are a bunch of moving pieces here. Its all bundled up in core code so its going to be tough to test. If we are considering _delete_region_ as a tool/util then we can refactor as a tool/util as like Online/Offline merge code. bq.Adding this to onlineregions, + public void deleteRegion(String regionName) throws IOException, KeeperException;, do all removals from online regions now use this new API (Its probably good having it here... but just wondering about the places where regions currently get removed from online map, do they go a different route than this new one?) New API _deleteRegion()_ does more than just removing from online region(s) map but other places we use {code} public boolean removeFromOnlineRegions(final String encodedName) {code} Its being called from _openRegion()_, _refreshRegion()_, _createDaughters()_ in _SplitTransaction_ and _CloseRegionHandler_ bq. How hard will it be to reuse parts to do say an online merge of a bunch of adjacent regions? Once Todd's proposal is implemented, will find out a way to do more refactoring (to cut down repeated code) bq. Are the enums duplicated? Yes, will take care of it in refactoring. bq. Why does zookeeper package have classes particular to master and regionserver? We do put all our ZK trackers in zookeeper package and this is how online schema change [HBASE-4213|https://issues.apache.org/jira/browse/HBASE-4213] was implemented. Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4365) Add a decent heuristic for region size
[ https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4365: - Attachment: 4365-v2.txt Make it the square of the count of regions. Address also a problem found by j-d where I was getting region size from conf instead of from HTD. This patch works on trunk only. Will need to do a version for 0.92. Add a decent heuristic for region size -- Key: HBASE-4365 URL: https://issues.apache.org/jira/browse/HBASE-4365 Project: HBase Issue Type: Improvement Affects Versions: 0.92.1, 0.94.0 Reporter: Todd Lipcon Priority: Critical Labels: usability Attachments: 4365-v2.txt, 4365.txt A few of us were brainstorming this morning about what the default region size should be. There were a few general points made: - in some ways it's better to be too-large than too-small, since you can always split a table further, but you can't merge regions currently - with HFile v2 and multithreaded compactions there are fewer reasons to avoid very-large regions (10GB+) - for small tables you may want a small region size just so you can distribute load better across a cluster - for big tables, multi-GB is probably best -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214075#comment-13214075 ] Phabricator commented on HBASE-5074: stack has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 Please ignore my previous comment on renaming these methods. On reread, I think they are plenty clear enough as they are. src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:120 Nit: Change this to be an @return javadoc so its clear we are returning current state of this flag? src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:164 Does mean that this feature is on by default? Should we read configuration to figure whether its on or not? src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:73 Is this threadsafe? This looks like a shared object? REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5347) GC free memory management in Level-1 Block Cache
[ https://issues.apache.org/jira/browse/HBASE-5347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214074#comment-13214074 ] Phabricator commented on HBASE-5347: khemani has commented on the revision [HBASE-5347] [jira] GC free memory management in Level-1 Block Cache. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java:417 I had changed getKey() and getValue() to copy the data because these aren't in any performance critical path src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java:1404 yes, we unpin on close. That is why I had to change the return of getGeneralBloomFilter() from DataInput to DataInputStream REVISION DETAIL https://reviews.facebook.net/D1635 GC free memory management in Level-1 Block Cache Key: HBASE-5347 URL: https://issues.apache.org/jira/browse/HBASE-5347 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani Assignee: Prakash Khemani Attachments: D1635.5.patch On eviction of a block from the block-cache, instead of waiting for the garbage collecter to reuse its memory, reuse the block right away. This will require us to keep reference counts on the HFile blocks. Once we have the reference counts in place we can do our own simple blocks-out-of-slab allocation for the block-cache. This will help us with * reducing gc pressure, especially in the old generation * making it possible to have non-java-heap memory backing the HFile blocks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5074) support checksums in HBase block cache
[ https://issues.apache.org/jira/browse/HBASE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214073#comment-13214073 ] Phabricator commented on HBASE-5074: stack has commented on the revision [jira] [HBASE-5074] Support checksums in HBase block cache. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:115 Please ignore my previous comment on renaming these methods. On reread, I think they are plenty clear enough as they are. src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:120 Nit: Change this to be an @return javadoc so its clear we are returning current state of this flag? src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:164 Does mean that this feature is on by default? Should we read configuration to figure whether its on or not? src/main/java/org/apache/hadoop/hbase/io/hfile/ChecksumUtil.java:73 Is this threadsafe? This looks like a shared object? REVISION DETAIL https://reviews.facebook.net/D1521 support checksums in HBase block cache -- Key: HBASE-5074 URL: https://issues.apache.org/jira/browse/HBASE-5074 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: D1521.1.patch, D1521.1.patch, D1521.2.patch, D1521.2.patch, D1521.3.patch, D1521.3.patch, D1521.4.patch, D1521.4.patch, D1521.5.patch, D1521.5.patch, D1521.6.patch, D1521.6.patch, D1521.7.patch, D1521.7.patch, D1521.8.patch, D1521.8.patch The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read into the HBase block cache actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5347) GC free memory management in Level-1 Block Cache
[ https://issues.apache.org/jira/browse/HBASE-5347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214088#comment-13214088 ] Zhihong Yu commented on HBASE-5347: --- @Lars: Just saw your comment @ 20/Feb/12 00:08 I agree with your observation - it is hard to know whether user filters would do the right action. I was thinking that the deref can be done outside filterRow() - in nextInternal() e.g. How about: 1. call deref() for every element in results 2. call filterRow() 3. call ref() on the remaining elements in results GC free memory management in Level-1 Block Cache Key: HBASE-5347 URL: https://issues.apache.org/jira/browse/HBASE-5347 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani Assignee: Prakash Khemani Attachments: D1635.5.patch On eviction of a block from the block-cache, instead of waiting for the garbage collecter to reuse its memory, reuse the block right away. This will require us to keep reference counts on the HFile blocks. Once we have the reference counts in place we can do our own simple blocks-out-of-slab allocation for the block-cache. This will help us with * reducing gc pressure, especially in the old generation * making it possible to have non-java-heap memory backing the HFile blocks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira