[jira] [Updated] (HBASE-5474) [89-fb] Share the multiput thread pool for all the HTable instance
[ https://issues.apache.org/jira/browse/HBASE-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5474: -- Summary: [89-fb] Share the multiput thread pool for all the HTable instance (was: Share the multiput thread pool for all the HTable instance) [89-fb] Share the multiput thread pool for all the HTable instance -- Key: HBASE-5474 URL: https://issues.apache.org/jira/browse/HBASE-5474 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Currently, each HTable instance will have a thread pool for the multiput operation. Each thread pool is actually a cached thread pool, which is bounded the number of region server. So the maximum number of threads will be ( # region server * # htable instance). On the other hand, if all HTable instance could share this thread pool, the max number threads will still be the same. However, it will increase the thread pool efficiency. Also the single put requests are processed within the current thread instead of the thread pool. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5474) Shared the multiput thread pool for all the HTable instance
[ https://issues.apache.org/jira/browse/HBASE-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5474: -- Description: Currently, each HTable instance will have a thread pool for the multiput operation. Each thread pool is actually a cached thread pool, which is bounded the number of region server. So the maximum number of threads will be ( # region server * # htable instance). On the other hand, if all HTable instance could share this thread pool, the max number threads will still be the same. However, it will increase the thread pool efficiency. Also the single put requests are processed within the current thread instead of the thread pool. was: Currently, each HTable instance will have a thread pool for the multiput operation. Each thread pool is actually a cached thread pool, which is bounded the number of region server. So the maximum number of threads will be ( # region server * # htable instance). On the other hand, if all HTable instance could share this thread pool, the max number threads will still be the same. However, it will increase the thread pool efficiency. Shared the multiput thread pool for all the HTable instance --- Key: HBASE-5474 URL: https://issues.apache.org/jira/browse/HBASE-5474 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Currently, each HTable instance will have a thread pool for the multiput operation. Each thread pool is actually a cached thread pool, which is bounded the number of region server. So the maximum number of threads will be ( # region server * # htable instance). On the other hand, if all HTable instance could share this thread pool, the max number threads will still be the same. However, it will increase the thread pool efficiency. Also the single put requests are processed within the current thread instead of the thread pool. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5474) Share the multiput thread pool for all the HTable instance
[ https://issues.apache.org/jira/browse/HBASE-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5474: -- Summary: Share the multiput thread pool for all the HTable instance (was: Shared the multiput thread pool for all the HTable instance) Share the multiput thread pool for all the HTable instance -- Key: HBASE-5474 URL: https://issues.apache.org/jira/browse/HBASE-5474 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Currently, each HTable instance will have a thread pool for the multiput operation. Each thread pool is actually a cached thread pool, which is bounded the number of region server. So the maximum number of threads will be ( # region server * # htable instance). On the other hand, if all HTable instance could share this thread pool, the max number threads will still be the same. However, it will increase the thread pool efficiency. Also the single put requests are processed within the current thread instead of the thread pool. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5474) Shared the multiput thread pool for all the HTable instance
[ https://issues.apache.org/jira/browse/HBASE-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5474: -- Description: Currently, each HTable instance will have a thread pool for the multiput operation. Each thread pool is actually a cached thread pool, which is bounded the number of region server. So the maximum number of threads will be ( # region server * # htable instance). On the other hand, if all HTable instance could share this thread pool, the max number threads will still be the same. However, it will increase the thread pool efficiency. was:Currently, each HTable instance will have a thread pool for the multiput operation. Each thread pool is actually a unbounded cached thread pool. So it would increase the efficiency if HTable could share this unbounded cached thread pool across all the HTable instance ? Shared the multiput thread pool for all the HTable instance --- Key: HBASE-5474 URL: https://issues.apache.org/jira/browse/HBASE-5474 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Currently, each HTable instance will have a thread pool for the multiput operation. Each thread pool is actually a cached thread pool, which is bounded the number of region server. So the maximum number of threads will be ( # region server * # htable instance). On the other hand, if all HTable instance could share this thread pool, the max number threads will still be the same. However, it will increase the thread pool efficiency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5438) A tool to check region balancing for a particular table
[ https://issues.apache.org/jira/browse/HBASE-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5438: -- Attachment: 0001-hbase-5438.patch A tool to check region balancing for a particular table --- Key: HBASE-5438 URL: https://issues.apache.org/jira/browse/HBASE-5438 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: 0001-hbase-5438.patch, D1827.1.patch, D1827.1.patch, D1827.1.patch When debugging the table level region imbalance problem, I write a tool to check how the region balanced across all the region server for a particular table. bin/hbase org.jruby.Main region_balance_checker.rb test_table -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5407) Show the per-region level request/sec count in the web ui
[ https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5407: -- Description: It would be nice to show the per-region level request/sec count in the web ui, especially when debugging the hot region problem. (was: It would be nice to show the per-region level request count in the web ui, especially when debugging the hot region problem.) Summary: Show the per-region level request/sec count in the web ui (was: Show the per-region level request count in the web ui) Show the per-region level request/sec count in the web ui - Key: HBASE-5407 URL: https://issues.apache.org/jira/browse/HBASE-5407 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang It would be nice to show the per-region level request/sec count in the web ui, especially when debugging the hot region problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5369) Compaction selection based on the hotness of the HFile's block in the block cache
[ https://issues.apache.org/jira/browse/HBASE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5369: -- Description: HBase reserves a large set memory for the block cache and the cached blocks will be age out in a LRU fashion. Obviously, we don't want to age out the blocks which are still hot. However, when the compactions are starting, these hot blocks may naturally be invalid. Considering that the block cache has already known which HFiles these hot blocks come from, the compaction selection algorithm could just simply skip compact these HFiles until these block cache become cold. For example, if there is a HFile and 80% of blocks for this HFile is be cached, which means this HFile is really hot, then just skip this HFile during the compaction selection. The percentage of hot blocks should be configured as a high bar to make sure that HBase are still making progress for the compaction. was: HBase reserves a large set memory for the block cache and the cached blocks will be age out in a LRU fashion. Obviously, we don't want to age out the blocks which are still hot. However, when the compactions are starting, these hot blocks may naturally be invalid. Considering that the block cache has already known which HFiles these hot blocks come from, the compaction selection algorithm could just simply skip compact these HFiles until these block cache become cold. Compaction selection based on the hotness of the HFile's block in the block cache - Key: HBASE-5369 URL: https://issues.apache.org/jira/browse/HBASE-5369 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang HBase reserves a large set memory for the block cache and the cached blocks will be age out in a LRU fashion. Obviously, we don't want to age out the blocks which are still hot. However, when the compactions are starting, these hot blocks may naturally be invalid. Considering that the block cache has already known which HFiles these hot blocks come from, the compaction selection algorithm could just simply skip compact these HFiles until these block cache become cold. For example, if there is a HFile and 80% of blocks for this HFile is be cached, which means this HFile is really hot, then just skip this HFile during the compaction selection. The percentage of hot blocks should be configured as a high bar to make sure that HBase are still making progress for the compaction. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5199) Delete out of TTL store files before compaction selection
[ https://issues.apache.org/jira/browse/HBASE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5199: -- Attachment: (was: HBASE-5199.patch) Delete out of TTL store files before compaction selection - Key: HBASE-5199 URL: https://issues.apache.org/jira/browse/HBASE-5199 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1311.1.patch, D1311.2.patch, D1311.3.patch, D1311.4.patch, D1311.5.patch, D1311.5.patch Currently, HBase deletes the out of TTL store files after compaction. We can change the sequence to delete the out of TTL store files before selecting store files for compactions. In this way, HBase can keep deleting the old invalid store files without compaction, and also prevent from unnecessary compactions since the out of TTL store files will be deleted before the compaction selection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5199) Delete out of TTL store files before compaction selection
[ https://issues.apache.org/jira/browse/HBASE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5199: -- Attachment: hbase-5199.patch The new patch is rebased on the latest trunk and all the unit tests are passed. Delete out of TTL store files before compaction selection - Key: HBASE-5199 URL: https://issues.apache.org/jira/browse/HBASE-5199 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1311.1.patch, D1311.2.patch, D1311.3.patch, D1311.4.patch, D1311.5.patch, D1311.5.patch, hbase-5199.patch Currently, HBase deletes the out of TTL store files after compaction. We can change the sequence to delete the out of TTL store files before selecting store files for compactions. In this way, HBase can keep deleting the old invalid store files without compaction, and also prevent from unnecessary compactions since the out of TTL store files will be deleted before the compaction selection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
[ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5259: -- Attachment: HBASE-5259.patch Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup. --- Key: HBASE-5259 URL: https://issues.apache.org/jira/browse/HBASE-5259 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end). So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5199) Delete out of TTL store files before compaction selection
[ https://issues.apache.org/jira/browse/HBASE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5199: -- Attachment: HBASE-5199.patch Delete out of TTL store files before compaction selection - Key: HBASE-5199 URL: https://issues.apache.org/jira/browse/HBASE-5199 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1311.1.patch, D1311.2.patch, D1311.3.patch, D1311.4.patch, D1311.5.patch, D1311.5.patch, HBASE-5199.patch Currently, HBase deletes the out of TTL store files after compaction. We can change the sequence to delete the out of TTL store files before selecting store files for compactions. In this way, HBase can keep deleting the old invalid store files without compaction, and also prevent from unnecessary compactions since the out of TTL store files will be deleted before the compaction selection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5199) Delete out of TTL store files before compaction selection
[ https://issues.apache.org/jira/browse/HBASE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5199: -- Release Note: set hbase.store.delete.expired.storefile as true to enable the expired store file deletion Status: Patch Available (was: Open) Delete out of TTL store files before compaction selection - Key: HBASE-5199 URL: https://issues.apache.org/jira/browse/HBASE-5199 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1311.1.patch, D1311.2.patch, D1311.3.patch, D1311.4.patch, D1311.5.patch, D1311.5.patch, HBASE-5199.patch Currently, HBase deletes the out of TTL store files after compaction. We can change the sequence to delete the out of TTL store files before selecting store files for compactions. In this way, HBase can keep deleting the old invalid store files without compaction, and also prevent from unnecessary compactions since the out of TTL store files will be deleted before the compaction selection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
[ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5259: -- Status: Patch Available (was: Open) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup. --- Key: HBASE-5259 URL: https://issues.apache.org/jira/browse/HBASE-5259 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end). So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5199) Delete out of TTL store files before compaction selection
[ https://issues.apache.org/jira/browse/HBASE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5199: -- Description: Currently, HBase deletes the out of TTL store files after compaction. We can change the sequence to delete the out of TTL store files before selecting store files for compactions. In this way, HBase can keep deleting the old invalid store files without compaction, and also prevent from unnecessary compactions since the out of TTL store files will be deleted before the compaction selection. was: Currently, HBase deletes the out of TTL store files after major compaction. We can change the sequence to delete the out of TTL store files before selecting store files for compactions. In this way, HBase can keep deleting the old invalid store files without major compaction, and also prevent from unnecessary major compactions since the out of TTL store files will be deleted before the compaction selection. Delete out of TTL store files before compaction selection - Key: HBASE-5199 URL: https://issues.apache.org/jira/browse/HBASE-5199 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Currently, HBase deletes the out of TTL store files after compaction. We can change the sequence to delete the out of TTL store files before selecting store files for compactions. In this way, HBase can keep deleting the old invalid store files without compaction, and also prevent from unnecessary compactions since the out of TTL store files will be deleted before the compaction selection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5033: -- Resolution: Fixed Status: Resolved (was: Patch Available) Opening/Closing store in parallel to reduce region open/close time -- Key: HBASE-5033 URL: https://issues.apache.org/jira/browse/HBASE-5033 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033.patch Region servers are opening/closing each store and each store file for every store in sequential fashion, which may cause inefficiency to open/close regions. So this diff is to open/close each store in parallel in order to reduce region open/close time. Also it would help to reduce the cluster restart time. 1) Opening each store in parallel 2) Loading each store file for every store in parallel 3) Closing each store in parallel 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5033: -- Attachment: (was: HBASE-5033-apach-trunk.patch) Opening/Closing store in parallel to reduce region open/close time -- Key: HBASE-5033 URL: https://issues.apache.org/jira/browse/HBASE-5033 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: 5033.txt, D933.1.patch, D933.2.patch, D933.3.patch, D933.4.patch, D933.5.patch Region servers are opening/closing each store and each store file for every store in sequential fashion, which may cause inefficiency to open/close regions. So this diff is to open/close each store in parallel in order to reduce region open/close time. Also it would help to reduce the cluster restart time. 1) Opening each store in parallel 2) Loading each store file for every store in parallel 3) Closing each store in parallel 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5033: -- Attachment: HBASE-5033-apach-trunk.patch 1) Based on the recent trunk and generate the patch --no-prefix 2) The default number of thread is set to 1. 3) Performance evaluation: the performance will be vary for different cluster environment such as the number of regions and the number of store files for each region. The simple restart test shows the single region server (22 regions) restart time decreased from 78 sec to 55 sec So this will roughly save about 29% region server restart time. Also the cluster (100 nodes) restart time decreased from 316 secs to 189 secs, which has saved around 40% restart time. Opening/Closing store in parallel to reduce region open/close time -- Key: HBASE-5033 URL: https://issues.apache.org/jira/browse/HBASE-5033 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: 5033.txt, D933.1.patch, D933.2.patch, D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033-apach-trunk.patch Region servers are opening/closing each store and each store file for every store in sequential fashion, which may cause inefficiency to open/close regions. So this diff is to open/close each store in parallel in order to reduce region open/close time. Also it would help to reduce the cluster restart time. 1) Opening each store in parallel 2) Loading each store file for every store in parallel 3) Closing each store in parallel 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5033: -- Attachment: (was: HBASE-5033-apach-trunk.patch) Opening/Closing store in parallel to reduce region open/close time -- Key: HBASE-5033 URL: https://issues.apache.org/jira/browse/HBASE-5033 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, D933.3.patch, D933.4.patch, D933.5.patch Region servers are opening/closing each store and each store file for every store in sequential fashion, which may cause inefficiency to open/close regions. So this diff is to open/close each store in parallel in order to reduce region open/close time. Also it would help to reduce the cluster restart time. 1) Opening each store in parallel 2) Loading each store file for every store in parallel 3) Closing each store in parallel 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5033: -- Attachment: HBASE-5033.patch Resubmit the patch. Thanks Ted for correction. Opening/Closing store in parallel to reduce region open/close time -- Key: HBASE-5033 URL: https://issues.apache.org/jira/browse/HBASE-5033 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033.patch Region servers are opening/closing each store and each store file for every store in sequential fashion, which may cause inefficiency to open/close regions. So this diff is to open/close each store in parallel in order to reduce region open/close time. Also it would help to reduce the cluster restart time. 1) Opening each store in parallel 2) Loading each store file for every store in parallel 3) Closing each store in parallel 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5033: -- Attachment: HBASE-5033-apach-trunk.patch The patch is based on the apache trunk. This patch has passed all the unit tests except for TestCoprocessorEndpoint, which is failed with or without this change. Opening/Closing store in parallel to reduce region open/close time -- Key: HBASE-5033 URL: https://issues.apache.org/jira/browse/HBASE-5033 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D933.1.patch, D933.2.patch, D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033-apach-trunk.patch Region servers are opening/closing each store and each store file for every store in sequential fashion, which may cause inefficiency to open/close regions. So this diff is to open/close each store in parallel in order to reduce region open/close time. Also it would help to reduce the cluster restart time. 1) Opening each store in parallel 2) Loading each store file for every store in parallel 3) Closing each store in parallel 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5033: -- Status: Patch Available (was: Open) Opening/Closing store in parallel to reduce region open/close time -- Key: HBASE-5033 URL: https://issues.apache.org/jira/browse/HBASE-5033 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D933.1.patch, D933.2.patch, D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033-apach-trunk.patch Region servers are opening/closing each store and each store file for every store in sequential fashion, which may cause inefficiency to open/close regions. So this diff is to open/close each store in parallel in order to reduce region open/close time. Also it would help to reduce the cluster restart time. 1) Opening each store in parallel 2) Loading each store file for every store in parallel 3) Closing each store in parallel 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5032) Add other DELETE or DELETE into the delete bloom filter
[ https://issues.apache.org/jira/browse/HBASE-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5032: -- Description: To speed up time range scans we need to seek to the maximum timestamp of the requested range,instead of going to the first KV of the (row, column) pair and iterating from there. If we don't know the (row, column), e.g. if it is not specified in the query, we need to go to end of the current row/column pair first, get a KV from there, and do another seek to (row', column', timerange_max) from there. We can only skip over to the timerange_max timestamp when we know that there are no DeleteColumn records at the top of that row/column with a higher timestamp. We can utilize another Bloom filter keyed on (row, column) to quickly find that out. (From HBASE-4962) So the motivation is to save seek ops for scanning time-range queries if we know there is no delete for this row/column. From the implementation prospective, we have already have a delete family bloom filter which contains all the was: Previously, the delete family bloom filter only contains the row key which has the delete family. It helps us to avoid the top-row seek operation. This jira attempts to add the delete column into this delete bloom filter as well (rename the delete family bloom filter as delete bloom filter). The motivation is to save seek ops for scan time-range queries if we know there is no delete column for this row/column. We can seek directly to the exact timestamp we are interested in, instead of seeking to the latest timestamp and keeping skipping to find out whether there is any delete column before the interested timestamp. Summary: Add other DELETE or DELETE into the delete bloom filter (was: Add DELETE COLUMN into the delete bloom filter) Add other DELETE or DELETE into the delete bloom filter Key: HBASE-5032 URL: https://issues.apache.org/jira/browse/HBASE-5032 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang To speed up time range scans we need to seek to the maximum timestamp of the requested range,instead of going to the first KV of the (row, column) pair and iterating from there. If we don't know the (row, column), e.g. if it is not specified in the query, we need to go to end of the current row/column pair first, get a KV from there, and do another seek to (row', column', timerange_max) from there. We can only skip over to the timerange_max timestamp when we know that there are no DeleteColumn records at the top of that row/column with a higher timestamp. We can utilize another Bloom filter keyed on (row, column) to quickly find that out. (From HBASE-4962) So the motivation is to save seek ops for scanning time-range queries if we know there is no delete for this row/column. From the implementation prospective, we have already have a delete family bloom filter which contains all the -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5032) Add other DELETE type information into the delete bloom filter to optimize the time range query
[ https://issues.apache.org/jira/browse/HBASE-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5032: -- Description: To speed up time range scans we need to seek to the maximum timestamp of the requested range,instead of going to the first KV of the (row, column) pair and iterating from there. If we don't know the (row, column), e.g. if it is not specified in the query, we need to go to end of the current row/column pair first, get a KV from there, and do another seek to (row', column', timerange_max) from there. We can only skip over to the timerange_max timestamp when we know that there are no DeleteColumn records at the top of that row/column with a higher timestamp. We can utilize another Bloom filter keyed on (row, column) to quickly find that out. (From HBASE-4962) So the motivation is to save seek ops for scanning time-range queries if we know there is no delete for this row/column. From the implementation prospective, we have already had a delete family bloom filter which contains all the delete family key values. So we can reuse the same bloom filter for all other kinds of delete information such as delete columns or delete. was: To speed up time range scans we need to seek to the maximum timestamp of the requested range,instead of going to the first KV of the (row, column) pair and iterating from there. If we don't know the (row, column), e.g. if it is not specified in the query, we need to go to end of the current row/column pair first, get a KV from there, and do another seek to (row', column', timerange_max) from there. We can only skip over to the timerange_max timestamp when we know that there are no DeleteColumn records at the top of that row/column with a higher timestamp. We can utilize another Bloom filter keyed on (row, column) to quickly find that out. (From HBASE-4962) So the motivation is to save seek ops for scanning time-range queries if we know there is no delete for this row/column. From the implementation prospective, we have already have a delete family bloom filter which contains all the Summary: Add other DELETE type information into the delete bloom filter to optimize the time range query (was: Add other DELETE or DELETE into the delete bloom filter) Add other DELETE type information into the delete bloom filter to optimize the time range query --- Key: HBASE-5032 URL: https://issues.apache.org/jira/browse/HBASE-5032 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang To speed up time range scans we need to seek to the maximum timestamp of the requested range,instead of going to the first KV of the (row, column) pair and iterating from there. If we don't know the (row, column), e.g. if it is not specified in the query, we need to go to end of the current row/column pair first, get a KV from there, and do another seek to (row', column', timerange_max) from there. We can only skip over to the timerange_max timestamp when we know that there are no DeleteColumn records at the top of that row/column with a higher timestamp. We can utilize another Bloom filter keyed on (row, column) to quickly find that out. (From HBASE-4962) So the motivation is to save seek ops for scanning time-range queries if we know there is no delete for this row/column. From the implementation prospective, we have already had a delete family bloom filter which contains all the delete family key values. So we can reuse the same bloom filter for all other kinds of delete information such as delete columns or delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4698) Let the HFile Pretty Printer print all the key values for a specific row.
[ https://issues.apache.org/jira/browse/HBASE-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4698: -- Status: Patch Available (was: Open) Let the HFile Pretty Printer print all the key values for a specific row. - Key: HBASE-4698 URL: https://issues.apache.org/jira/browse/HBASE-4698 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D111.1.patch, D111.1.patch, D111.1.patch, D111.2.patch, D111.3.patch, D111.4.patch, HBASE-4689-trunk.patch When using HFile Pretty Printer to debug HBase issues, it would very nice to allow the Pretty Printer to seek to a specific row, and only print all the key values for this row. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5033: -- Description: 1) Opening each store in parallel 2) Loading each store file for every store in parallel 3) Closing each store in parallel 4) Closing each store file for every store in parallel. was:Opening store files in parallel to reduce region open time Summary: Opening/Closing store in parallel to reduce region open/close time (was: Opening store files in parallel to reduce region open time) Opening/Closing store in parallel to reduce region open/close time -- Key: HBASE-5033 URL: https://issues.apache.org/jira/browse/HBASE-5033 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang 1) Opening each store in parallel 2) Loading each store file for every store in parallel 3) Closing each store in parallel 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-5033: -- Description: Region servers are opening/closing each store and each store file for every store in sequential fashion, which may cause inefficiency to open/close regions. So this diff is to open/close each store in parallel in order to reduce region open/close time. Also it would help to reduce the cluster restart time. 1) Opening each store in parallel 2) Loading each store file for every store in parallel 3) Closing each store in parallel 4) Closing each store file for every store in parallel. was: 1) Opening each store in parallel 2) Loading each store file for every store in parallel 3) Closing each store in parallel 4) Closing each store file for every store in parallel. Opening/Closing store in parallel to reduce region open/close time -- Key: HBASE-5033 URL: https://issues.apache.org/jira/browse/HBASE-5033 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D933.1.patch Region servers are opening/closing each store and each store file for every store in sequential fashion, which may cause inefficiency to open/close regions. So this diff is to open/close each store in parallel in order to reduce region open/close time. Also it would help to reduce the cluster restart time. 1) Opening each store in parallel 2) Loading each store file for every store in parallel 3) Closing each store in parallel 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4698) Let the HFile Pretty Printer print all the key values for a specific row.
[ https://issues.apache.org/jira/browse/HBASE-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4698: -- Attachment: HBASE-4689-trunk.patch The diff based on 89-fb has been accepted and committed. So generate this patch rebasing on the latest apache trunk. Let the HFile Pretty Printer print all the key values for a specific row. - Key: HBASE-4698 URL: https://issues.apache.org/jira/browse/HBASE-4698 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D111.1.patch, D111.1.patch, D111.1.patch, D111.2.patch, D111.3.patch, D111.4.patch, HBASE-4689-trunk.patch When using HFile Pretty Printer to debug HBase issues, it would very nice to allow the Pretty Printer to seek to a specific row, and only print all the key values for this row. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4689) [89-fb] Make the table level metrics work with rpc* metrics
[ https://issues.apache.org/jira/browse/HBASE-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4689: -- Attachment: (was: hbase-4689.patch) [89-fb] Make the table level metrics work with rpc* metrics --- Key: HBASE-4689 URL: https://issues.apache.org/jira/browse/HBASE-4689 Project: HBase Issue Type: Sub-task Affects Versions: 0.89.20100924 Reporter: Liyin Tang Assignee: Liyin Tang In r1182034, the per table/cf for rpc* metrics has a bug. It will only show cf level metrics even though we enabled the per table level metrics. Fix this bug here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4689) [89-fb] Make the table level metrics work with rpc* metrics
[ https://issues.apache.org/jira/browse/HBASE-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4689: -- Attachment: (was: hbase-4689.patch) [89-fb] Make the table level metrics work with rpc* metrics --- Key: HBASE-4689 URL: https://issues.apache.org/jira/browse/HBASE-4689 Project: HBase Issue Type: Sub-task Affects Versions: 0.89.20100924 Reporter: Liyin Tang Assignee: Liyin Tang Attachments: hbase-4689.patch In r1182034, the per table/cf for rpc* metrics has a bug. It will only show cf level metrics even though we enabled the per table level metrics. Fix this bug here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4689) [89-fb] Make the table level metrics work with rpc* metrics
[ https://issues.apache.org/jira/browse/HBASE-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4689: -- Attachment: hbase-4689.patch [89-fb] Make the table level metrics work with rpc* metrics --- Key: HBASE-4689 URL: https://issues.apache.org/jira/browse/HBASE-4689 Project: HBase Issue Type: Sub-task Affects Versions: 0.89.20100924 Reporter: Liyin Tang Assignee: Liyin Tang Attachments: hbase-4689.patch In r1182034, the per table/cf for rpc* metrics has a bug. It will only show cf level metrics even though we enabled the per table level metrics. Fix this bug here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4191) hbase load balancer needs locality awareness
[ https://issues.apache.org/jira/browse/HBASE-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4191: -- Description: Previously, HBASE-4114 implements the metrics for HFile HDFS block locality, which provides the HFile level locality information. But in order to work with load balancer and region assignment, we need the region level locality information. Let's define the region locality information first, which is almost the same as HFile locality index. HRegion locality index (HRegion A, RegionServer B) = (Total number of HDFS blocks that can be retrieved locally by the RegionServer B for the HRegion A) / ( Total number of the HDFS blocks for the Region A) So the HRegion locality index tells us that how much locality we can get if the HMaster assign the HRegion A to the RegionServer B. So there will be 2 steps involved to assign regions based on the locality. 1) During the cluster start up time, the master will scan the hdfs to calculate the HRegion locality index for each pair of HRegion and Region Server. It is pretty expensive to scan the dfs. So we only needs to do this once during the start up time. 2) During the cluster run time, each region server will update the HRegion locality index as metrics periodically as HBASE-4114 did. The Region Server can expose them to the Master through ZK, meta table, or just RPC messages. Based on the HRegion locality index, the assignment manager in the master would have a global knowledge about the region locality distribution. Imaging the HRegion locality index as the capacity between the region server set and region set, the assignment manager could the run the MAXIMUM FLOW solver to reach the global optimization. In addition, the HBASE-4491 (Locality Checker) is the tool, which is based on the same metrics, to proactively to scan dfs to calculate the global locality information in the cluster. It will help us to verify data locality information during the run time. was: HBASE-4114 implemented getTopBlockLocations(). Load balancer should utilize this method and assign the region to be moved to the region server with the highest block affinity. Issue Type: New Feature (was: Improvement) Summary: hbase load balancer needs locality awareness (was: Utilize getTopBlockLocations in load balancer) hbase load balancer needs locality awareness Key: HBASE-4191 URL: https://issues.apache.org/jira/browse/HBASE-4191 Project: HBase Issue Type: New Feature Reporter: Ted Yu Assignee: Liyin Tang Previously, HBASE-4114 implements the metrics for HFile HDFS block locality, which provides the HFile level locality information. But in order to work with load balancer and region assignment, we need the region level locality information. Let's define the region locality information first, which is almost the same as HFile locality index. HRegion locality index (HRegion A, RegionServer B) = (Total number of HDFS blocks that can be retrieved locally by the RegionServer B for the HRegion A) / ( Total number of the HDFS blocks for the Region A) So the HRegion locality index tells us that how much locality we can get if the HMaster assign the HRegion A to the RegionServer B. So there will be 2 steps involved to assign regions based on the locality. 1) During the cluster start up time, the master will scan the hdfs to calculate the HRegion locality index for each pair of HRegion and Region Server. It is pretty expensive to scan the dfs. So we only needs to do this once during the start up time. 2) During the cluster run time, each region server will update the HRegion locality index as metrics periodically as HBASE-4114 did. The Region Server can expose them to the Master through ZK, meta table, or just RPC messages. Based on the HRegion locality index, the assignment manager in the master would have a global knowledge about the region locality distribution. Imaging the HRegion locality index as the capacity between the region server set and region set, the assignment manager could the run the MAXIMUM FLOW solver to reach the global optimization. In addition, the HBASE-4491 (Locality Checker) is the tool, which is based on the same metrics, to proactively to scan dfs to calculate the global locality information in the cluster. It will help us to verify data locality information during the run time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4469) Avoid top row seek by looking up ROWCOL bloomfilter
[ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4469: -- Summary: Avoid top row seek by looking up ROWCOL bloomfilter (was: Avoid top row seek by looking up bloomfilter) Avoid top row seek by looking up ROWCOL bloomfilter --- Key: HBASE-4469 URL: https://issues.apache.org/jira/browse/HBASE-4469 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: HBASE-4469_1.patch The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4532: -- Description: The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can only get 10% more seek savings if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get 10% more seek savings for ALL kinds of bloom filter. was: HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization:
[jira] [Updated] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4532: -- Attachment: hbase-4532-89-fb.patch Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch, hbase-4532-89-fb.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4532: -- Attachment: HBASE-4532-apache-trunk.patch Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4418) Show all the hbase configuration in the web ui
[ https://issues.apache.org/jira/browse/HBASE-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4418: -- Attachment: (was: HBASE-4418_1.patch) Show all the hbase configuration in the web ui -- Key: HBASE-4418 URL: https://issues.apache.org/jira/browse/HBASE-4418 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The motivation is to show ALL the HBase configuration, which takes effect in the run time, in a global place. So we can easily know which configuration takes effect and what the value is. The configuration shows all the HBase and DFS configuration entry in the configuration file and also includes all the HBase default setting in the code, which is not the config file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4418) Show all the hbase configuration in the web ui
[ https://issues.apache.org/jira/browse/HBASE-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4418: -- Attachment: (was: HBASE-4418_2.patch) Show all the hbase configuration in the web ui -- Key: HBASE-4418 URL: https://issues.apache.org/jira/browse/HBASE-4418 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The motivation is to show ALL the HBase configuration, which takes effect in the run time, in a global place. So we can easily know which configuration takes effect and what the value is. The configuration shows all the HBase and DFS configuration entry in the configuration file and also includes all the HBase default setting in the code, which is not the config file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4418) Show all the hbase configuration in the web ui
[ https://issues.apache.org/jira/browse/HBASE-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4418: -- Attachment: hbase-4418-apache-trunk.patch Using refection to check whether the hadoop supports showing configuration in /conf servlet (HADOOP-6408). If the hadoop supports HADOOP-6408, the master and region server web ui will have the link to show HBase configuration. Show all the hbase configuration in the web ui -- Key: HBASE-4418 URL: https://issues.apache.org/jira/browse/HBASE-4418 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: hbase-4418-apache-trunk.patch The motivation is to show ALL the HBase configuration, which takes effect in the run time, in a global place. So we can easily know which configuration takes effect and what the value is. The configuration shows all the HBase and DFS configuration entry in the configuration file and also includes all the HBase default setting in the code, which is not the config file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4585) Avoid seek operation when current kv is deleted
[ https://issues.apache.org/jira/browse/HBASE-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4585: -- Attachment: (was: hbase-4585-apache.patch) Avoid seek operation when current kv is deleted --- Key: HBASE-4585 URL: https://issues.apache.org/jira/browse/HBASE-4585 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: hbase-4585-89.patch When the current kv is deleted during the matching in the ScanQueryMatcher, currently the matcher will return skip and continue to seek. Actually, if the current kv is deleted because of family deleted or column deleted, the matcher should seek to next col. If the current kv is deleted because of version deleted, the matcher should just return skip. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4585) Avoid seek operation when current kv is deleted
[ https://issues.apache.org/jira/browse/HBASE-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4585: -- Attachment: (was: hbase-4585-trunk.patch) Avoid seek operation when current kv is deleted --- Key: HBASE-4585 URL: https://issues.apache.org/jira/browse/HBASE-4585 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: hbase-4585-89.patch When the current kv is deleted during the matching in the ScanQueryMatcher, currently the matcher will return skip and continue to seek. Actually, if the current kv is deleted because of family deleted or column deleted, the matcher should seek to next col. If the current kv is deleted because of version deleted, the matcher should just return skip. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4585) Avoid seek operation when current kv is deleted
[ https://issues.apache.org/jira/browse/HBASE-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4585: -- Attachment: hbase-4585-apache.patch The patch for apache-trunk is ready. Avoid seek operation when current kv is deleted --- Key: HBASE-4585 URL: https://issues.apache.org/jira/browse/HBASE-4585 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: hbase-4585-89.patch When the current kv is deleted during the matching in the ScanQueryMatcher, currently the matcher will return skip and continue to seek. Actually, if the current kv is deleted because of family deleted or column deleted, the matcher should seek to next col. If the current kv is deleted because of version deleted, the matcher should just return skip. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4585) Avoid seek operation when current kv is deleted
[ https://issues.apache.org/jira/browse/HBASE-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4585: -- Attachment: hbase-4585-apache-trunk.patch Avoid seek operation when current kv is deleted --- Key: HBASE-4585 URL: https://issues.apache.org/jira/browse/HBASE-4585 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: hbase-4585-89.patch, hbase-4585-apache-trunk.patch When the current kv is deleted during the matching in the ScanQueryMatcher, currently the matcher will return skip and continue to seek. Actually, if the current kv is deleted because of family deleted or column deleted, the matcher should seek to next col. If the current kv is deleted because of version deleted, the matcher should just return skip. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4592) Get CacheStats request count based on the HFile block type
[ https://issues.apache.org/jira/browse/HBASE-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4592: -- Summary: Get CacheStats request count based on the HFile block type (was: Get HFile request count based on the block type) Get CacheStats request count based on the HFile block type --- Key: HBASE-4592 URL: https://issues.apache.org/jira/browse/HBASE-4592 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Liyin Tang Currently, the CacheStats can only get the total request count for all the block type. We can break down this metrics to different block type. It will benefit us to get more fine grained metrics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4532: -- Description: HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. was: HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for row with empty column. Previous solution is to create the dedicated bloom filter for delete family, which does not work if there is a row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The root cause is that even there is no delete family at top row, we still cannot avoid the top row seek. We can ONLY avoid the top row seek when there is no row with empty column, no matter what kind of kv type (delete/deleteCol/deleteFamily/put). So the current solution is to create the dedicate bloom filter for row with empty column. Summary: Avoid top row seek by dedicated bloom filter for delete family bloom filter (was: Avoid top row seek by dedicated bloom filter for row with empty column) Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4592) Break down request count based on the HFile block type
[ https://issues.apache.org/jira/browse/HBASE-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4592: -- Summary: Break down request count based on the HFile block type (was: Get CacheStats request count based on the HFile block type ) Break down request count based on the HFile block type --- Key: HBASE-4592 URL: https://issues.apache.org/jira/browse/HBASE-4592 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Liyin Tang Currently, the CacheStats can only get the total request count for all the block type. We can break down this metrics to different block type. It will benefit us to get more fine grained metrics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4469) Avoid top row seek by looking up bloomfilter
[ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4469: -- Attachment: HBASE-4469_1.patch Avoid top row seek by looking up bloomfilter Key: HBASE-4469 URL: https://issues.apache.org/jira/browse/HBASE-4469 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: HBASE-4469_1.patch The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4568) Make zk dump jsp response more quickly
[ https://issues.apache.org/jira/browse/HBASE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4568: -- Description: 1) For each zk dump, currently hbase will create a zk client instance every time. This is quite slow when any machines in the quorum is dead. Because it will connect to each machine in the zk quorum again. code HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER); Configuration conf = master.getConfiguration(); HBaseAdmin hbadmin = new HBaseAdmin(conf); HConnection connection = hbadmin.getConnection(); ZooKeeperWatcher watcher = connection.getZooKeeperWatcher(); /code So we can simplify this: code HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER); ZooKeeperWatcher watcher = master.getZooKeeperWatcher(); /code 2) Also when hbase call getServerStats() for each machine in the zk quorum, it hard coded the default time out as 1 min. It would be nice to make this configurable and set it to a low time out. When hbase tries to connect to each machine in the zk quorum, it will create the socket, and then set the socket time out, and read it with this time out. It means hbase will create a socket and connect to the zk server with 0 time out at first, which will take a long time. Because a timeout of zero is interpreted as an infinite timeout. The connection will then block until established or an error occurs. 3) The recoverable zookeeper should be real exponentially backoff when there is connection loss exception, which will give hbase much longer time window to recover from zk machine failures. was: For each zk dump, currently hbase will create a zk client instance every time. This is quite slow when any machines in the quorum is dead. Because it will connect to each machine in the zk quorum again. HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER); Configuration conf = master.getConfiguration(); HBaseAdmin hbadmin = new HBaseAdmin(conf); HConnection connection = hbadmin.getConnection(); ZooKeeperWatcher watcher = connection.getZooKeeperWatcher(); So we can simplify this: HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER); ZooKeeperWatcher watcher = master.getZooKeeperWatcher(); Also when hbase call getServerStats() for each machine in the zk quorum, it hard coded the default time out as 1 min. It would be nice to make this configurable and set it to a low time out. When hbase tries to connect to each machine in the zk quorum, it will create the socket, and then set the socket time out, and read it with this time out. It means hbase will create a socket and connect to the zk server with 0 time out at first, which will take a long time. Because a timeout of zero is interpreted as an infinite timeout. The connection will then block until established or an error occurs. Make zk dump jsp response more quickly -- Key: HBASE-4568 URL: https://issues.apache.org/jira/browse/HBASE-4568 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: HBASE-4568.patch 1) For each zk dump, currently hbase will create a zk client instance every time. This is quite slow when any machines in the quorum is dead. Because it will connect to each machine in the zk quorum again. code HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER); Configuration conf = master.getConfiguration(); HBaseAdmin hbadmin = new HBaseAdmin(conf); HConnection connection = hbadmin.getConnection(); ZooKeeperWatcher watcher = connection.getZooKeeperWatcher(); /code So we can simplify this: code HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER); ZooKeeperWatcher watcher = master.getZooKeeperWatcher(); /code 2) Also when hbase call getServerStats() for each machine in the zk quorum, it hard coded the default time out as 1 min. It would be nice to make this configurable and set it to a low time out. When hbase tries to connect to each machine in the zk quorum, it will create the socket, and then set the socket time out, and read it with this time out. It means hbase will create a socket and connect to the zk server with 0 time out at first, which will take a long time. Because a timeout of zero is interpreted as an infinite timeout. The connection will then block until established or an error occurs. 3) The recoverable zookeeper should be real exponentially backoff when there is connection loss exception, which will give hbase much longer time window to recover from zk machine failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Updated] (HBASE-4585) Avoid seek operation when current kv is deleted
[ https://issues.apache.org/jira/browse/HBASE-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4585: -- Attachment: hbase-4585-89.patch Avoid seek operation when current kv is deleted --- Key: HBASE-4585 URL: https://issues.apache.org/jira/browse/HBASE-4585 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: hbase-4585-89.patch When the current kv is deleted during the matching in the ScanQueryMatcher, currently the matcher will return skip and continue to seek. Actually, if the current kv is deleted because of family deleted or column deleted, the matcher should seek to next col. If the current kv is deleted because of version deleted, the matcher should just return skip. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4585) Avoid seek operation when current kv is deleted
[ https://issues.apache.org/jira/browse/HBASE-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4585: -- Attachment: hbase-4585-trunk.patch Avoid seek operation when current kv is deleted --- Key: HBASE-4585 URL: https://issues.apache.org/jira/browse/HBASE-4585 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: hbase-4585-89.patch, hbase-4585-trunk.patch When the current kv is deleted during the matching in the ScanQueryMatcher, currently the matcher will return skip and continue to seek. Actually, if the current kv is deleted because of family deleted or column deleted, the matcher should seek to next col. If the current kv is deleted because of version deleted, the matcher should just return skip. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4532) Avoid top row seek by dedicated bloom filter for row with empty column
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4532: -- Description: HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for row with empty column. Previous solution is to create the dedicated bloom filter for delete family, which does not work if there is a row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The root cause is that even there is no delete family at top row, we still cannot avoid the top row seek. We can ONLY avoid the top row seek when there is no row with empty column, no matter what kind of kv type (delete/deleteCol/deleteFamily/put). So the current solution is to create the dedicate bloom filter for row with empty column. was: HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family. Summary: Avoid top row seek by dedicated bloom filter for row with empty column (was: Avoid top row seek by dedicated bloom filter for delete family) Avoid top row seek by dedicated bloom filter for row with empty column -- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for row with empty column. Previous solution is to create the dedicated bloom filter for delete family, which does not work if there is a row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The root cause is that even there is no delete family at top row, we still cannot avoid the top row seek. We can ONLY avoid the top row seek when there is no row with empty column, no matter what kind of kv type (delete/deleteCol/deleteFamily/put). So the current solution is to create the dedicate bloom filter for row with empty column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4568) Make zk dump jsp response more quickly
[ https://issues.apache.org/jira/browse/HBASE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4568: -- Description: For each zk dump, currently hbase will create a zk client instance every time. This is quite slow when any machines in the quorum is dead. Because it will connect to each machine in the zk quorum again. HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER); Configuration conf = master.getConfiguration(); HBaseAdmin hbadmin = new HBaseAdmin(conf); HConnection connection = hbadmin.getConnection(); ZooKeeperWatcher watcher = connection.getZooKeeperWatcher(); So we can simplify this: HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER); ZooKeeperWatcher watcher = master.getZooKeeperWatcher(); Also when hbase call getServerStats() for each machine in the zk quorum, it hard coded the default time out as 1 min. It would be nice to make this configurable and set it to a low time out. When hbase tries to connect to each machine in the zk quorum, it will create the socket, and then set the socket time out, and read it with this time out. It means hbase will create a socket and connect to the zk server with 0 time out at first, which will take a long time. Because a timeout of zero is interpreted as an infinite timeout. The connection will then block until established or an error occurs. was: For each zk dump, currently hbase will create a zk client instance every time. This is quite slow when any machines in the quorum is dead. Because it will connect to each machine in the zk quorum again. HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER); Configuration conf = master.getConfiguration(); HBaseAdmin hbadmin = new HBaseAdmin(conf); HConnection connection = hbadmin.getConnection(); ZooKeeperWatcher watcher = connection.getZooKeeperWatcher(); So we can simplify this: HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER); ZooKeeperWatcher watcher = master.getZooKeeperWatcher(); Also when hbase call getServerStats() for each machine in the zk quorum, it hard coded the default time out as 1 min. It would be nice to make this configurable and set it to a low time out. Make zk dump jsp response more quickly -- Key: HBASE-4568 URL: https://issues.apache.org/jira/browse/HBASE-4568 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang For each zk dump, currently hbase will create a zk client instance every time. This is quite slow when any machines in the quorum is dead. Because it will connect to each machine in the zk quorum again. HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER); Configuration conf = master.getConfiguration(); HBaseAdmin hbadmin = new HBaseAdmin(conf); HConnection connection = hbadmin.getConnection(); ZooKeeperWatcher watcher = connection.getZooKeeperWatcher(); So we can simplify this: HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER); ZooKeeperWatcher watcher = master.getZooKeeperWatcher(); Also when hbase call getServerStats() for each machine in the zk quorum, it hard coded the default time out as 1 min. It would be nice to make this configurable and set it to a low time out. When hbase tries to connect to each machine in the zk quorum, it will create the socket, and then set the socket time out, and read it with this time out. It means hbase will create a socket and connect to the zk server with 0 time out at first, which will take a long time. Because a timeout of zero is interpreted as an infinite timeout. The connection will then block until established or an error occurs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4418) Show all the hbase configuration in the web ui
[ https://issues.apache.org/jira/browse/HBASE-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-4418: -- Attachment: HBASE-4418_1.patch Show all the hbase configuration in the web ui -- Key: HBASE-4418 URL: https://issues.apache.org/jira/browse/HBASE-4418 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: HBASE-4418_1.patch The motivation is to show ALL the HBase configuration, which takes effect in the run time, in a global place. So we can easily know which configuration takes effect and what the value is. The configuration shows all the HBase and DFS configuration entry in the configuration file and also includes all the HBase default setting in the code, which is not the config file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira