[jira] [Commented] (HBASE-5776) HTableMultiplexer
[ https://issues.apache.org/jira/browse/HBASE-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253768#comment-13253768 ] Liyin Tang commented on HBASE-5776: --- @Todd, The HTableMultiplexer is designed to process the puts requests across different tables. All the puts across the tables will be sharded into each different queues based on their destination region server. It will help to batch more puts for each region server before sending out the rpc request. HTableMultiplexer -- Key: HBASE-5776 URL: https://issues.apache.org/jira/browse/HBASE-5776 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D2775.1.patch, D2775.1.patch, D2775.2.patch, D2775.2.patch There is a known issue in HBase client that single slow/dead region server could slow down the multiput operations across all the region servers. So the HBase client will be as slow as the slowest region server in the cluster. To solve this problem, HTableMultiplexer will separate the multiput submitting threads with the flush threads, which means the multiput operation will be a nonblocking operation. The submitting thread will shard all the puts into different queues based on its destination region server and return immediately. The flush threads will flush these puts from each queue to its destination region server. Currently the HTableMultiplexer only supports the put operation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5407) Show the per-region level request/sec count in the web ui
[ https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210455#comment-13210455 ] Liyin Tang commented on HBASE-5407: --- Hi Stack. This patch is to add total read/write request number and read/write request per second for each region in 89-fb branch. For the apache trunk, I will also need to add the read/write request per second only. Show the per-region level request/sec count in the web ui - Key: HBASE-5407 URL: https://issues.apache.org/jira/browse/HBASE-5407 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1779.1.patch, D1779.1.patch, D1779.1.patch It would be nice to show the per-region level request/sec count in the web ui, especially when debugging the hot region problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5407) Show the per-region level request count in the web ui
[ https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208742#comment-13208742 ] Liyin Tang commented on HBASE-5407: --- Awesome! Thanks Jean. I think I just need to port this patch. Show the per-region level request count in the web ui - Key: HBASE-5407 URL: https://issues.apache.org/jira/browse/HBASE-5407 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang It would be nice to show the per-region level request count in the web ui, especially when debugging the hot region problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5403) Checkpoint the compressed HLog
[ https://issues.apache.org/jira/browse/HBASE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208831#comment-13208831 ] Liyin Tang commented on HBASE-5403: --- @Nicolas, The block size in the DFS usually will be set quite large, let's say 256M. And it is inefficient to write small log file which is less than one dfs block. I asume this is the main benefit of checkpointing vs rolling the log. Checkpoint the compressed HLog -- Key: HBASE-5403 URL: https://issues.apache.org/jira/browse/HBASE-5403 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Let's assume that HBase replication can be based on replaying the HLog in the replica cluster. The replica process could be crash during the replay. Obviously, the replica process need a way to start from the lastest check point in the HLog, even the HLog is compressed. So the proposal is to write a series of checkpoints within the HLog. Each each checkpoint will have a header with some special sequence of bytes. And between each checkpoints, HLog should use new dictionaries to compress. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5407) Show the per-region level request/sec count in the web ui
[ https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208995#comment-13208995 ] Liyin Tang commented on HBASE-5407: --- The total number is very useful and it would be nice to add the request/sec on the web UI as well. I have updated the title and description for the jira. Thanks Jean for the heads up. Show the per-region level request/sec count in the web ui - Key: HBASE-5407 URL: https://issues.apache.org/jira/browse/HBASE-5407 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang It would be nice to show the per-region level request/sec count in the web ui, especially when debugging the hot region problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5381) Make memstore.flush.size as a table level configuration
[ https://issues.apache.org/jira/browse/HBASE-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205689#comment-13205689 ] Liyin Tang commented on HBASE-5381: --- Thanks Jean and Ted. I missed something before. Please close this jira for me. Thanks a lot Make memstore.flush.size as a table level configuration --- Key: HBASE-5381 URL: https://issues.apache.org/jira/browse/HBASE-5381 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Currently the region server will flush mem store of the region based on the limitation of the global mem store flush size and global low water mark. However, It will cause the hot tables, which serve more write traffic, to flush too frequently even though the overall mem store heap usage is quite low. Too frequently flush would also contribute to too many minor compactions. So if we can make memstore.flush.size as a table level configuration, it would be more flexible to config different tables with different desired mem store flush size based on compaction ratio, recovery time and put ops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5199) Delete out of TTL store files before compaction selection
[ https://issues.apache.org/jira/browse/HBASE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204926#comment-13204926 ] Liyin Tang commented on HBASE-5199: --- Ping committers ! Delete out of TTL store files before compaction selection - Key: HBASE-5199 URL: https://issues.apache.org/jira/browse/HBASE-5199 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1311.1.patch, D1311.2.patch, D1311.3.patch, D1311.4.patch, D1311.5.patch, D1311.5.patch, HBASE-5199.patch Currently, HBase deletes the out of TTL store files after compaction. We can change the sequence to delete the out of TTL store files before selecting store files for compactions. In this way, HBase can keep deleting the old invalid store files without compaction, and also prevent from unnecessary compactions since the out of TTL store files will be deleted before the compaction selection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5373) Table level lock to prevent the race of multiple table level operation
[ https://issues.apache.org/jira/browse/HBASE-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205120#comment-13205120 ] Liyin Tang commented on HBASE-5373: --- Cool! Sounds like what I try to do here. I will take a look over Accumulo. Table level lock to prevent the race of multiple table level operation -- Key: HBASE-5373 URL: https://issues.apache.org/jira/browse/HBASE-5373 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang A table level lock can guarantee that only one table operation would happen at one time for each table. The master should require and release these table locks correctly during the failover time. One proposal is to keep track of the lock and its corresponding operation in the zookeeper. If there is a master failover, the secondary should have a way to check whether these operations are succeeded nor not before releasing the lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
[ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196315#comment-13196315 ] Liyin Tang commented on HBASE-5259: --- @Ted, the TableInputFormatBase in the mapred package has already Deprecated as the code marked. No need to update the patch. @Deprecated public abstract class TableInputFormatBase {} Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup. --- Key: HBASE-5259 URL: https://issues.apache.org/jira/browse/HBASE-5259 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end). So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
[ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195061#comment-13195061 ] Liyin Tang commented on HBASE-5259: --- Hi Ted, I totally understand your concern and appreciate your feedback. It would be nice to fault tolerant all kinds of DNS server failures, which could be transient failures, loss of PTR or DNS service crash. The tradeoff is to select a most frequent happening failure case and try to tolerate it gracefully. In my perspective, for some large impact failures such as DNS server crash, sometimes it would be better to fire alarm and try to fix it as soon as possible. Also for minor impact failures, it would be great to recovery it naturally. For others, it would be fine to pay some cost. If you believe the loss of PTR record is the normal failure case in your systems, I would encourage to open a new jira to handle it properly across all the code base of HBase, DFS and MapReduce. I do believe we need a better fault tolerant policy across all these dependent components. Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup. --- Key: HBASE-5259 URL: https://issues.apache.org/jira/browse/HBASE-5259 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end). So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
[ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195345#comment-13195345 ] Liyin Tang commented on HBASE-5259: --- Has that package been deprecated ? Two similar packages look confusing to me. Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup. --- Key: HBASE-5259 URL: https://issues.apache.org/jira/browse/HBASE-5259 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end). So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
[ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195360#comment-13195360 ] Liyin Tang commented on HBASE-5259: --- I see. I will generate another patch including /mapred. Thanks Ted. Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup. --- Key: HBASE-5259 URL: https://issues.apache.org/jira/browse/HBASE-5259 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end). So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5274) Filter out the expired store file scanner during the compaction
[ https://issues.apache.org/jira/browse/HBASE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192730#comment-13192730 ] Liyin Tang commented on HBASE-5274: --- I am not sure why the Phabricator add so many duplicated comments. Sorry about the spamming. @Todd, HBase-5274 tries to avoid scanning any data from the expired store file scanner. So compacting the expired store file will be very cheap. And HBase-5199 actually is related to HBASE-4717. which performs the age-out compaction. Filter out the expired store file scanner during the compaction --- Key: HBASE-5274 URL: https://issues.apache.org/jira/browse/HBASE-5274 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D1407.1.patch, D1407.1.patch, D1407.1.patch, D1407.1.patch, D1407.1.patch During the compaction time, HBase will generate a store scanner which will scan a list of store files. And it would be more efficient to filer out the expired store file since there is no need to read any key values from these store files. This optimization has been already implemented on 89-fb and this is the building block for HBASE-5199 as well. It is supposed to be no-ops to compact the expired store files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5199) Delete out of TTL store files before compaction selection
[ https://issues.apache.org/jira/browse/HBASE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186059#comment-13186059 ] Liyin Tang commented on HBASE-5199: --- Thanks Lars and Kannan. I will double check this. Delete out of TTL store files before compaction selection - Key: HBASE-5199 URL: https://issues.apache.org/jira/browse/HBASE-5199 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Currently, HBase deletes the out of TTL store files after major compaction. We can change the sequence to delete the out of TTL store files before selecting store files for compactions. In this way, HBase can keep deleting the old invalid store files without major compaction, and also prevent from unnecessary major compactions since the out of TTL store files will be deleted before the compaction selection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13184603#comment-13184603 ] Liyin Tang commented on HBASE-5033: --- Thanks Ted. BTW, I do use --no-prefix for this recently submitted patch. Opening/Closing store in parallel to reduce region open/close time -- Key: HBASE-5033 URL: https://issues.apache.org/jira/browse/HBASE-5033 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033-apach-trunk.patch Region servers are opening/closing each store and each store file for every store in sequential fashion, which may cause inefficiency to open/close regions. So this diff is to open/close each store in parallel in order to reduce region open/close time. Also it would help to reduce the cluster restart time. 1) Opening each store in parallel 2) Loading each store file for every store in parallel 3) Closing each store in parallel 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time
[ https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181134#comment-13181134 ] Liyin Tang commented on HBASE-5033: --- ping committers ! Opening/Closing store in parallel to reduce region open/close time -- Key: HBASE-5033 URL: https://issues.apache.org/jira/browse/HBASE-5033 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D933.1.patch, D933.2.patch, D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033-apach-trunk.patch Region servers are opening/closing each store and each store file for every store in sequential fashion, which may cause inefficiency to open/close regions. So this diff is to open/close each store in parallel in order to reduce region open/close time. Also it would help to reduce the cluster restart time. 1) Opening each store in parallel 2) Loading each store file for every store in parallel 3) Closing each store in parallel 4) Closing each store file for every store in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4742) Split dead server's log in parallel
[ https://issues.apache.org/jira/browse/HBASE-4742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13172723#comment-13172723 ] Liyin Tang commented on HBASE-4742: --- @Nicolas, this is only for 89-fb. Trunk has already splited the dead server's log in parallel in another way. Split dead server's log in parallel --- Key: HBASE-4742 URL: https://issues.apache.org/jira/browse/HBASE-4742 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D237.1.patch, D237.10.patch, D237.2.patch, D237.3.patch, D237.4.patch, D237.5.patch, D237.6.patch, D237.7.patch, D237.8.patch, D237.9.patch When one region server goes down, the master will shutdown the region server and split its log. However, splitting log is a blocking call and it would take some time. If more than one region server go down, the master will split its log one by one, which is not efficient. Since we have the distributed log split, we could split these logs from the dead servers in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141360#comment-13141360 ] Liyin Tang commented on HBASE-4532: --- Shall we add an incompatible flag for this jira? Because adding a new block type is not backward compatible. Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138799#comment-13138799 ] Liyin Tang commented on HBASE-4532: --- Thanks Jonathan for the patch. I should remove this line out. Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13139016#comment-13139016 ] Liyin Tang commented on HBASE-4532: --- Thank Ted, Jonathan Gray for committing this. I will double check the submitted patch to avoid this problem. Nice Catch Jonathan Hsieh. Thank you for the patch:) Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133585#comment-13133585 ] Liyin Tang commented on HBASE-4532: --- Thanks Ted:) here is the test results I got. So the testConnectionUniqueness in TestHCM has been fixed now ? == Results : Tests in error: testConnectionUniqueness(org.apache.hadoop.hbase.client.TestHCM) testOrphanLogCreation(org.apache.hadoop.hbase.master.TestDistributedLogSplitting): Unexpected exception, expectedorg.apache.hadoop.hbase.regionserver.wal.OrphanHLogAfterSplitException but wasjava.lang.NullPointerException testOrphanLogCreation(org.apache.hadoop.hbase.master.TestDistributedLogSplitting) testRecoveredEdits(org.apache.hadoop.hbase.master.TestDistributedLogSplitting): /data/users/liyintang/hbase-os-trunk/target/test-data/3d058c80-266a-4164-8143-925d514f016e/09d560d3-254e-4986-abe1-22b876d299f1/4758e332-2ae7-4194-bfea-900ee4a2e3ab/dfs/name1/current/fsimage (Too many open files) testRecoveredEdits(org.apache.hadoop.hbase.master.TestDistributedLogSplitting) testWorkerAbort(org.apache.hadoop.hbase.master.TestDistributedLogSplitting): /data/users/liyintang/hbase-os-trunk/target/test-data/3d058c80-266a-4164-8143-925d514f016e/09d560d3-254e-4986-abe1-22b876d299f1/4758e332-2ae7-4194-bfea-900ee4a2e3ab/3949c75c-8c23-4513-b1cc-e94b1bba640b/dfs/name1/current/fsimage (Too many open files) testWorkerAbort(org.apache.hadoop.hbase.master.TestDistributedLogSplitting) Tests run: 1056, Failures: 0, Errors: 7, Skipped: 9 Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133449#comment-13133449 ] Liyin Tang commented on HBASE-4532: --- For 89-fb, all the unit tests are passed. For apache-trunk, there are 2 unit tests failed with and without my change: TestHCM and TestDistributedLogSpliting Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4191) Utilize getTopBlockLocations in load balancer
[ https://issues.apache.org/jira/browse/HBASE-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13132200#comment-13132200 ] Liyin Tang commented on HBASE-4191: --- Hi Ted, do you have started working on this. I have a similar feature to do :) Utilize getTopBlockLocations in load balancer - Key: HBASE-4191 URL: https://issues.apache.org/jira/browse/HBASE-4191 Project: HBase Issue Type: Improvement Reporter: Ted Yu HBASE-4114 implemented getTopBlockLocations(). Load balancer should utilize this method and assign the region to be moved to the region server with the highest block affinity. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4633) Potential memory leak in client RPC timeout mechanism
[ https://issues.apache.org/jira/browse/HBASE-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131347#comment-13131347 ] Liyin Tang commented on HBASE-4633: --- I have also noticed some memory leak problems in HBase client. Our symptom is that the memory footprint will increase as time. But the actual heap size of the client is not increasing. The leak should come from non-heap memory. But Not sure the leak comes from HBase Client jar itself or just our client code. So I am very interested to know when you have keep the heap size in control, is the memory leaking solved ? Potential memory leak in client RPC timeout mechanism - Key: HBASE-4633 URL: https://issues.apache.org/jira/browse/HBASE-4633 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.3 Environment: HBase version: 0.90.3 + Patches , Hadoop version: CDH3u0 Reporter: Shrijeet Paliwal Relevant Jiras: https://issues.apache.org/jira/browse/HBASE-2937, https://issues.apache.org/jira/browse/HBASE-4003 We have been using the 'hbase.client.operation.timeout' knob introduced in 2937 for quite some time now. It helps us enforce SLA. We have two HBase clusters and two HBase client clusters. One of them is much busier than the other. We have seen a deterministic behavior of clients running in busy cluster. Their (client's) memory footprint increases consistently after they have been up for roughly 24 hours. This memory footprint almost doubles from its usual value (usual case == RPC timeout disabled). After much investigation nothing concrete came out and we had to put a hack which keep heap size in control even when RPC timeout is enabled. Also note , the same behavior is not observed in 'not so busy cluster. The patch is here : https://gist.github.com/1288023 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4611) Add support for Phabricator/Differential as an alternative code review tool
[ https://issues.apache.org/jira/browse/HBASE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13130137#comment-13130137 ] Liyin Tang commented on HBASE-4611: --- Very Awesome. I have tried to created one review on Phabricator :) Add support for Phabricator/Differential as an alternative code review tool --- Key: HBASE-4611 URL: https://issues.apache.org/jira/browse/HBASE-4611 Project: HBase Issue Type: Task Reporter: Jonathan Gray Attachments: D21.1.patch, D21.1.patch From http://phabricator.org/ : Phabricator is a open source collection of web applications which make it easier to write, review, and share source code. It is currently available as an early release. Phabricator was developed at Facebook. It's open source so pretty much anyone could host an instance of this software. To begin with, there will be a public-facing instance located at http://reviews.facebook.net (sponsored by Facebook and hosted by the OSUOSL http://osuosl.org). We will use this JIRA to deal with adding (and ensuring) Apache-friendly support that will allow us to do code reviews with Phabricator for HBase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4585) Avoid seek operation when current kv is deleted
[ https://issues.apache.org/jira/browse/HBASE-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129511#comment-13129511 ] Liyin Tang commented on HBASE-4585: --- I have run all the unit tests. The following unit tests failed with and without the patch of this jira. TestAvroServer and TestDistributedLogSplitting Avoid seek operation when current kv is deleted --- Key: HBASE-4585 URL: https://issues.apache.org/jira/browse/HBASE-4585 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: hbase-4585-89.patch, hbase-4585-apache-trunk.patch When the current kv is deleted during the matching in the ScanQueryMatcher, currently the matcher will return skip and continue to seek. Actually, if the current kv is deleted because of family deleted or column deleted, the matcher should seek to next col. If the current kv is deleted because of version deleted, the matcher should just return skip. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter
[ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13126760#comment-13126760 ] Liyin Tang commented on HBASE-4469: --- @stack. HBASE-4469 optimizes the top row seek if the ROWCOL Bloom filter is enabled. And HBASE-4532 will optimize the top row seek if ROW or NONE Bloom filter is enabled. So HBASE-4469 + HBASE-4532 will optimize all the cases. And it is necessary to commit this first:) Avoid top row seek by looking up bloomfilter Key: HBASE-4469 URL: https://issues.apache.org/jira/browse/HBASE-4469 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4418) Show all the hbase configuration in the web ui
[ https://issues.apache.org/jira/browse/HBASE-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13126979#comment-13126979 ] Liyin Tang commented on HBASE-4418: --- @stack, it is pretty safte to commit HBASE-4418_1.patch :) Show all the hbase configuration in the web ui -- Key: HBASE-4418 URL: https://issues.apache.org/jira/browse/HBASE-4418 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: HBASE-4418_1.patch, HBASE-4418_2.patch The motivation is to show ALL the HBase configuration, which takes effect in the run time, in a global place. So we can easily know which configuration takes effect and what the value is. The configuration shows all the HBase and DFS configuration entry in the configuration file and also includes all the HBase default setting in the code, which is not the config file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter
[ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13127066#comment-13127066 ] Liyin Tang commented on HBASE-4469: --- Cool, I just downloaded the patch from review board (https://reviews.apache.org/r/2235/) and attached here:) Thanks Jonathan. Avoid top row seek by looking up bloomfilter Key: HBASE-4469 URL: https://issues.apache.org/jira/browse/HBASE-4469 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: HBASE-4469_1.patch The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter
[ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13127114#comment-13127114 ] Liyin Tang commented on HBASE-4469: --- @Jonathan, For this jira specifically, it has been committed to 89-fb internal branch before cutting the public 89-fb branch. So it should already in the public 89-fb right now. Avoid top row seek by looking up bloomfilter Key: HBASE-4469 URL: https://issues.apache.org/jira/browse/HBASE-4469 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: HBASE-4469_1.patch The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4585) Avoid seek operation when current kv is deleted
[ https://issues.apache.org/jira/browse/HBASE-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13127272#comment-13127272 ] Liyin Tang commented on HBASE-4585: --- Patch for 89-fb and apache trunk are all available right now. Avoid seek operation when current kv is deleted --- Key: HBASE-4585 URL: https://issues.apache.org/jira/browse/HBASE-4585 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: hbase-4585-89.patch, hbase-4585-trunk.patch When the current kv is deleted during the matching in the ScanQueryMatcher, currently the matcher will return skip and continue to seek. Actually, if the current kv is deleted because of family deleted or column deleted, the matcher should seek to next col. If the current kv is deleted because of version deleted, the matcher should just return skip. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4585) Avoid seek operation when current kv is deleted
[ https://issues.apache.org/jira/browse/HBASE-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13127273#comment-13127273 ] Liyin Tang commented on HBASE-4585: --- Patch for 89-fb and apache trunk are all available right now. Avoid seek operation when current kv is deleted --- Key: HBASE-4585 URL: https://issues.apache.org/jira/browse/HBASE-4585 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: hbase-4585-89.patch, hbase-4585-trunk.patch When the current kv is deleted during the matching in the ScanQueryMatcher, currently the matcher will return skip and continue to seek. Actually, if the current kv is deleted because of family deleted or column deleted, the matcher should seek to next col. If the current kv is deleted because of version deleted, the matcher should just return skip. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter
[ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13126336#comment-13126336 ] Liyin Tang commented on HBASE-4469: --- HBASE-4532 will enable delete family Bloom filter only when Row or None Bloom filter is enabled. Because if there is a delete family the store file, the RowCol Bloom filter has already had this information. Avoid top row seek by looking up bloomfilter Key: HBASE-4469 URL: https://issues.apache.org/jira/browse/HBASE-4469 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter
[ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122161#comment-13122161 ] Liyin Tang commented on HBASE-4469: --- Yes, I didn't change that unit tests TestBlocksRead, which is passed successfully. Avoid top row seek by looking up bloomfilter Key: HBASE-4469 URL: https://issues.apache.org/jira/browse/HBASE-4469 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4241) Optimize flushing of the Store cache for max versions and (new) min versions
[ https://issues.apache.org/jira/browse/HBASE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119629#comment-13119629 ] Liyin Tang commented on HBASE-4241: --- Hi Lars, Thanks for your patch. I am trying to back port this feature for hbase-89 I have a quick question:) why we use CollectionBackedScanner but not reuse memstore scanner? Thanks a lot Optimize flushing of the Store cache for max versions and (new) min versions Key: HBASE-4241 URL: https://issues.apache.org/jira/browse/HBASE-4241 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 4241-v2.txt, 4241-v8.txt, 4241.txt As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk. Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore. The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired. Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only need to add a scanner to Memstore that can scan along the current snapshot. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4522) Make hbase-site-custom.xml override the hbase-site.xml
[ https://issues.apache.org/jira/browse/HBASE-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13118284#comment-13118284 ] Liyin Tang commented on HBASE-4522: --- @Jonathon: It can:) That's why I am wondering should I open source this change:) For us, hbase-site.xml works as hbase-default.xml and hbase-site-custom.xml works as hbase-site xml. That's why we need to make hbase-site-custom.xml overrides to hbase-site.xml. But in the open source trunk, we don't even have hbase-site-custom.xml at all. Make hbase-site-custom.xml override the hbase-site.xml -- Key: HBASE-4522 URL: https://issues.apache.org/jira/browse/HBASE-4522 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Liyin Tang Priority: Minor Fix For: 0.94.0 The motivation for diff is that we want to override some config change for any specific cluster easily by just adding the config entries in the hbase-site-custom.xml for that cluster. This change adds the hbase-site-custom.xml configuration file into HBaseConfiguration. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4418) Show all the hbase configuration in the web ui
[ https://issues.apache.org/jira/browse/HBASE-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13118542#comment-13118542 ] Liyin Tang commented on HBASE-4418: --- Thanks stack and Todd. I have attached a very simple patch here. If user runs hadoop with HADOOP-6408, /conf will show all the hbase configuration. If user run hadoop without HADOOP-6408, nothing will change:) What do you guys think? Show all the hbase configuration in the web ui -- Key: HBASE-4418 URL: https://issues.apache.org/jira/browse/HBASE-4418 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: HBASE-4418_1.patch The motivation is to show ALL the HBase configuration, which takes effect in the run time, in a global place. So we can easily know which configuration takes effect and what the value is. The configuration shows all the HBase and DFS configuration entry in the configuration file and also includes all the HBase default setting in the code, which is not the config file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4418) Show all the hbase configuration in the web ui
[ https://issues.apache.org/jira/browse/HBASE-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13118544#comment-13118544 ] Liyin Tang commented on HBASE-4418: --- BTW, I have tested it with hadoop-22 and it works. Show all the hbase configuration in the web ui -- Key: HBASE-4418 URL: https://issues.apache.org/jira/browse/HBASE-4418 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: HBASE-4418_1.patch The motivation is to show ALL the HBase configuration, which takes effect in the run time, in a global place. So we can easily know which configuration takes effect and what the value is. The configuration shows all the HBase and DFS configuration entry in the configuration file and also includes all the HBase default setting in the code, which is not the config file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4418) Show all the hbase configuration in the web ui
[ https://issues.apache.org/jira/browse/HBASE-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13118585#comment-13118585 ] Liyin Tang commented on HBASE-4418: --- @stack, HBASE-4418_2.patch has 2 links from master and region server web ui to configuration page. if you would make hbase-trunk based on hadoop-0.23, then HBASE-4418_2.patch is better. Otherwise HBASE-4418_1.patch is better. Thanks Show all the hbase configuration in the web ui -- Key: HBASE-4418 URL: https://issues.apache.org/jira/browse/HBASE-4418 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: HBASE-4418_1.patch, HBASE-4418_2.patch The motivation is to show ALL the HBase configuration, which takes effect in the run time, in a global place. So we can easily know which configuration takes effect and what the value is. The configuration shows all the HBase and DFS configuration entry in the configuration file and also includes all the HBase default setting in the code, which is not the config file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4418) Show all the hbase configuration in the web ui
[ https://issues.apache.org/jira/browse/HBASE-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13117573#comment-13117573 ] Liyin Tang commented on HBASE-4418: --- @stack,is it true that all the patches for hbase trunk should rebase on hadoop trunk or hadoop-0.24 (the latest release) ? Show all the hbase configuration in the web ui -- Key: HBASE-4418 URL: https://issues.apache.org/jira/browse/HBASE-4418 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The motivation is to show ALL the HBase configuration, which takes effect in the run time, in a global place. So we can easily know which configuration takes effect and what the value is. The configuration shows all the HBase and DFS configuration entry in the configuration file and also includes all the HBase default setting in the code, which is not the config file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4418) Show all the hbase configuration in the web ui
[ https://issues.apache.org/jira/browse/HBASE-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13117874#comment-13117874 ] Liyin Tang commented on HBASE-4418: --- @Todd, I have created a HADOOP-7702, which will show all the default configuration value in /conf servlet. So HBase can reuse them for fee. Can you assign that jira to me? Thanks Show all the hbase configuration in the web ui -- Key: HBASE-4418 URL: https://issues.apache.org/jira/browse/HBASE-4418 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The motivation is to show ALL the HBase configuration, which takes effect in the run time, in a global place. So we can easily know which configuration takes effect and what the value is. The configuration shows all the HBase and DFS configuration entry in the configuration file and also includes all the HBase default setting in the code, which is not the config file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4418) Show all the hbase configuration in the web ui
[ https://issues.apache.org/jira/browse/HBASE-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13117875#comment-13117875 ] Liyin Tang commented on HBASE-4418: --- @Todd, I have created a HADOOP-7702, which will show all the default configuration value in /conf servlet. So HBase can reuse them for fee. Can you assign that jira to me? Thanks Show all the hbase configuration in the web ui -- Key: HBASE-4418 URL: https://issues.apache.org/jira/browse/HBASE-4418 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The motivation is to show ALL the HBase configuration, which takes effect in the run time, in a global place. So we can easily know which configuration takes effect and what the value is. The configuration shows all the HBase and DFS configuration entry in the configuration file and also includes all the HBase default setting in the code, which is not the config file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4491) HBase Locality Checker
[ https://issues.apache.org/jira/browse/HBASE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115025#comment-13115025 ] Liyin Tang commented on HBASE-4491: --- @Ted: Yes, it looks like be covered HBASE-4191. I can follow up for HBASE-4191. HBase Locality Checker -- Key: HBASE-4491 URL: https://issues.apache.org/jira/browse/HBASE-4491 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Liyin Tang If we run data node and region server in the same physical machine, region server will be benefit if the store files for its serving regions have a local replica in the data node process. So for each regions, there exists a best locality region server which has most local blocks for this region. The HBase Locality Checker will show how many regions is running on its best locality region server. The higher the number is, the more performance benefits HBase can get from data locality. Also there would be a followup task to use these region locality information for region assignment. Assignment manager will prefer assign regions to its best locality region server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira