[jira] [Comment Edited] (HBASE-17177) Major compaction can break the region/row level atomic when scan even if we pass mvcc to client

2016-12-01 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15714140#comment-15714140
 ] 

Phil Yang edited comment on HBASE-17177 at 12/2/16 5:45 AM:


I think at first we should know if we can return a consistent view to a 
reopened scanner, no matter the region is moved or not. So we should record the 
minReadPoint of last major compaction and when we open a region we should also 
know it. We can add a field to HFile's header and if it is generated by a major 
compaction this field is the minReadPoint that the compaction used. After this 
we will know when a scanner comes, we can return a consistent view or not.

Now we have a TTL(hbase.client.scanner.timeout.period) for scanner in server. 
If there is no requests within TTL milliseconds, we can remove the scanner. So 
I think when we open a region, we can wait same time before we want to do a 
major compaction. Although the scanner may has been expired at former RS, it is 
safe and TTL is not a long time.


was (Author: yangzhe1991):
I think at first we should know if we can return a consistent view to a 
reopened scanner, no matter the region is moved or not. So we should record the 
minReadPoint of last major compaction and when we open a region we should also 
know it. We can add a field to HFile's header and if it is generated by a major 
compaction this filed is the minReadPoint that the compaction used. After this 
we will know when a scanner comes, we can return a consistent view or not.

Now we have a TTL(hbase.client.scanner.timeout.period) for scanner in server. 
If there is no requests within TTL milliseconds, we can remove the scanner. So 
I think when we open a region, we can wait same time before we want to do a 
major compaction. Although the scanner may has been expired at former RS, it is 
safe and TTL is not a long time.

> Major compaction can break the region/row level atomic when scan even if we 
> pass mvcc to client
> ---
>
> Key: HBASE-17177
> URL: https://issues.apache.org/jira/browse/HBASE-17177
> Project: HBase
>  Issue Type: Sub-task
>  Components: scan
>Reporter: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
>
> We know that major compaction will actually delete the cells which are 
> deleted by a delete marker. In order to give a consistent view for a scan, we 
> need to use a map to track the read points for all scanners for a region, and 
> the smallest one will be used for a compaction. For all delete markers whose 
> mvcc is greater than this value, we will not use it to delete other cells.
> And the problem for a scan restart after region move is that, the new RS does 
> not have the information of the scanners opened at the old RS before the 
> client sends scan requests to the new RS which means the read points map is 
> incomplete and the smallest read point maybe greater than the correct value. 
> So if a major compaction happens at that time, it may delete some cells which 
> should be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-17177) Major compaction can break the region/row level atomic when scan even if we pass mvcc to client

2016-12-01 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15714140#comment-15714140
 ] 

Phil Yang edited comment on HBASE-17177 at 12/2/16 5:45 AM:


I think at first we should know if we can return a consistent view to a 
reopened scanner, no matter the region is moved or not. So we should record the 
minReadPoint of last major compaction and when we open a region we should also 
know it. We can add a field to HFile's header and if it is generated by a major 
compaction this filed is the minReadPoint that the compaction used. After this 
we will know when a scanner comes, we can return a consistent view or not.

Now we have a TTL(hbase.client.scanner.timeout.period) for scanner in server. 
If there is no requests within TTL milliseconds, we can remove the scanner. So 
I think when we open a region, we can wait same time before we want to do a 
major compaction. Although the scanner may has been expired at former RS, it is 
safe and TTL is not a long time.


was (Author: yangzhe1991):
I think at first we should know if we can return a consistent view to a 
reopened scanner, no matter the region is moved or not. So we should record the 
minReadPoint of last major compaction and when we open a region we should also 
know it. We can add a filed to HFile's header and if it is generated by a major 
compaction this filed is the minReadPoint that the compaction used. After this 
we will know when a scanner comes, we can return a consistent view or not.

Now we have a TTL(hbase.client.scanner.timeout.period) for scanner in server. 
If there is no requests within TTL milliseconds, we can remove the scanner. So 
I think when we open a region, we can wait same time before we want to do a 
major compaction. Although the scanner may has been expired at former RS, it is 
safe and TTL is not a long time.

> Major compaction can break the region/row level atomic when scan even if we 
> pass mvcc to client
> ---
>
> Key: HBASE-17177
> URL: https://issues.apache.org/jira/browse/HBASE-17177
> Project: HBase
>  Issue Type: Sub-task
>  Components: scan
>Reporter: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
>
> We know that major compaction will actually delete the cells which are 
> deleted by a delete marker. In order to give a consistent view for a scan, we 
> need to use a map to track the read points for all scanners for a region, and 
> the smallest one will be used for a compaction. For all delete markers whose 
> mvcc is greater than this value, we will not use it to delete other cells.
> And the problem for a scan restart after region move is that, the new RS does 
> not have the information of the scanners opened at the old RS before the 
> client sends scan requests to the new RS which means the read points map is 
> incomplete and the smallest read point maybe greater than the correct value. 
> So if a major compaction happens at that time, it may delete some cells which 
> should be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)