[jira] [Comment Edited] (HBASE-17177) Major compaction can break the region/row level atomic when scan even if we pass mvcc to client
[ https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15714140#comment-15714140 ] Phil Yang edited comment on HBASE-17177 at 12/2/16 5:45 AM: I think at first we should know if we can return a consistent view to a reopened scanner, no matter the region is moved or not. So we should record the minReadPoint of last major compaction and when we open a region we should also know it. We can add a field to HFile's header and if it is generated by a major compaction this field is the minReadPoint that the compaction used. After this we will know when a scanner comes, we can return a consistent view or not. Now we have a TTL(hbase.client.scanner.timeout.period) for scanner in server. If there is no requests within TTL milliseconds, we can remove the scanner. So I think when we open a region, we can wait same time before we want to do a major compaction. Although the scanner may has been expired at former RS, it is safe and TTL is not a long time. was (Author: yangzhe1991): I think at first we should know if we can return a consistent view to a reopened scanner, no matter the region is moved or not. So we should record the minReadPoint of last major compaction and when we open a region we should also know it. We can add a field to HFile's header and if it is generated by a major compaction this filed is the minReadPoint that the compaction used. After this we will know when a scanner comes, we can return a consistent view or not. Now we have a TTL(hbase.client.scanner.timeout.period) for scanner in server. If there is no requests within TTL milliseconds, we can remove the scanner. So I think when we open a region, we can wait same time before we want to do a major compaction. Although the scanner may has been expired at former RS, it is safe and TTL is not a long time. > Major compaction can break the region/row level atomic when scan even if we > pass mvcc to client > --- > > Key: HBASE-17177 > URL: https://issues.apache.org/jira/browse/HBASE-17177 > Project: HBase > Issue Type: Sub-task > Components: scan >Reporter: Duo Zhang > Fix For: 2.0.0, 1.4.0 > > > We know that major compaction will actually delete the cells which are > deleted by a delete marker. In order to give a consistent view for a scan, we > need to use a map to track the read points for all scanners for a region, and > the smallest one will be used for a compaction. For all delete markers whose > mvcc is greater than this value, we will not use it to delete other cells. > And the problem for a scan restart after region move is that, the new RS does > not have the information of the scanners opened at the old RS before the > client sends scan requests to the new RS which means the read points map is > incomplete and the smallest read point maybe greater than the correct value. > So if a major compaction happens at that time, it may delete some cells which > should be kept. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-17177) Major compaction can break the region/row level atomic when scan even if we pass mvcc to client
[ https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15714140#comment-15714140 ] Phil Yang edited comment on HBASE-17177 at 12/2/16 5:45 AM: I think at first we should know if we can return a consistent view to a reopened scanner, no matter the region is moved or not. So we should record the minReadPoint of last major compaction and when we open a region we should also know it. We can add a field to HFile's header and if it is generated by a major compaction this filed is the minReadPoint that the compaction used. After this we will know when a scanner comes, we can return a consistent view or not. Now we have a TTL(hbase.client.scanner.timeout.period) for scanner in server. If there is no requests within TTL milliseconds, we can remove the scanner. So I think when we open a region, we can wait same time before we want to do a major compaction. Although the scanner may has been expired at former RS, it is safe and TTL is not a long time. was (Author: yangzhe1991): I think at first we should know if we can return a consistent view to a reopened scanner, no matter the region is moved or not. So we should record the minReadPoint of last major compaction and when we open a region we should also know it. We can add a filed to HFile's header and if it is generated by a major compaction this filed is the minReadPoint that the compaction used. After this we will know when a scanner comes, we can return a consistent view or not. Now we have a TTL(hbase.client.scanner.timeout.period) for scanner in server. If there is no requests within TTL milliseconds, we can remove the scanner. So I think when we open a region, we can wait same time before we want to do a major compaction. Although the scanner may has been expired at former RS, it is safe and TTL is not a long time. > Major compaction can break the region/row level atomic when scan even if we > pass mvcc to client > --- > > Key: HBASE-17177 > URL: https://issues.apache.org/jira/browse/HBASE-17177 > Project: HBase > Issue Type: Sub-task > Components: scan >Reporter: Duo Zhang > Fix For: 2.0.0, 1.4.0 > > > We know that major compaction will actually delete the cells which are > deleted by a delete marker. In order to give a consistent view for a scan, we > need to use a map to track the read points for all scanners for a region, and > the smallest one will be used for a compaction. For all delete markers whose > mvcc is greater than this value, we will not use it to delete other cells. > And the problem for a scan restart after region move is that, the new RS does > not have the information of the scanners opened at the old RS before the > client sends scan requests to the new RS which means the read points map is > incomplete and the smallest read point maybe greater than the correct value. > So if a major compaction happens at that time, it may delete some cells which > should be kept. -- This message was sent by Atlassian JIRA (v6.3.4#6332)