[jira] [Commented] (HBASE-17177) Major compaction can break the region/row level atomic when scan even if we pass mvcc to client
[ https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15714140#comment-15714140 ] Phil Yang commented on HBASE-17177: --- I think at first we should know if we can return a consistent view to a reopened scanner, no matter the region is moved or not. So we should record the minReadPoint of last major compaction and when we open a region we should also know it. We can add a filed to HFile's header and if it is generated by a major compaction this filed is the minReadPoint that the compaction used. After this we will know when a scanner comes, we can return a consistent view or not. Now we have a TTL(hbase.client.scanner.timeout.period) for scanner in server. If there is no requests within TTL milliseconds, we can remove the scanner. So I think when we open a region, we can wait same time before we want to do a major compaction. Although the scanner may has been expired at former RS, it is safe and TTL is not a long time. > Major compaction can break the region/row level atomic when scan even if we > pass mvcc to client > --- > > Key: HBASE-17177 > URL: https://issues.apache.org/jira/browse/HBASE-17177 > Project: HBase > Issue Type: Sub-task > Components: scan >Reporter: Duo Zhang > Fix For: 2.0.0, 1.4.0 > > > We know that major compaction will actually delete the cells which are > deleted by a delete marker. In order to give a consistent view for a scan, we > need to use a map to track the read points for all scanners for a region, and > the smallest one will be used for a compaction. For all delete markers whose > mvcc is greater than this value, we will not use it to delete other cells. > And the problem for a scan restart after region move is that, the new RS does > not have the information of the scanners opened at the old RS before the > client sends scan requests to the new RS which means the read points map is > incomplete and the smallest read point maybe greater than the correct value. > So if a major compaction happens at that time, it may delete some cells which > should be kept. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17177) Major compaction can break the region/row level atomic when scan even if we pass mvcc to client
[ https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15714130#comment-15714130 ] Duo Zhang commented on HBASE-17177: --- {quote} Not sure about NONE/ROW/REGION. Can we do REGION first, since mvcc is by region, and then if needed do ROW and NONE. {quote} NONE/ROW/REGION is the lower bound, if there is no error then we will always have the REGION level atomicity. The problem only happens when there is an error and we need to reopen a scanner. We will try our best to keep the REGION level atomicity but as said above, we can not always succeed. And if the bad things happen, then we will use the 'atomicity' option to determine if we can go on or throw an exception to user. Thanks. > Major compaction can break the region/row level atomic when scan even if we > pass mvcc to client > --- > > Key: HBASE-17177 > URL: https://issues.apache.org/jira/browse/HBASE-17177 > Project: HBase > Issue Type: Sub-task > Components: scan >Reporter: Duo Zhang > Fix For: 2.0.0, 1.4.0 > > > We know that major compaction will actually delete the cells which are > deleted by a delete marker. In order to give a consistent view for a scan, we > need to use a map to track the read points for all scanners for a region, and > the smallest one will be used for a compaction. For all delete markers whose > mvcc is greater than this value, we will not use it to delete other cells. > And the problem for a scan restart after region move is that, the new RS does > not have the information of the scanners opened at the old RS before the > client sends scan requests to the new RS which means the read points map is > incomplete and the smallest read point maybe greater than the correct value. > So if a major compaction happens at that time, it may delete some cells which > should be kept. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17177) Major compaction can break the region/row level atomic when scan even if we pass mvcc to client
[ https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15714094#comment-15714094 ] stack commented on HBASE-17177: --- A region opens after a move, and a major compaction could start. It would look for smallest read point. There might be none so it would think it could clean up all deletes. After, a restarted scan comes in with an mvcc that is older than current read point. Region does not keep record of the mvcc that the last or current ongoing major compaction used. If it did, we could fail the scan if its mvcc was older than that of the major compaction. Yeah, seems smart to delay major compaction until a good while after a region opens so restarted acanners have a chance of getting back in. Can we find a latch that is other than time based (Wait a few minutes)? Compactions get promoted from minor to major if it happens that the minor compaction includes all hfiles. We'd have to undo this or not allow the upgrade. Not sure about NONE/ROW/REGION. Can we do REGION first, since mvcc is by region, and then if needed do ROW and NONE. This is an awkward problem. > Major compaction can break the region/row level atomic when scan even if we > pass mvcc to client > --- > > Key: HBASE-17177 > URL: https://issues.apache.org/jira/browse/HBASE-17177 > Project: HBase > Issue Type: Sub-task > Components: scan >Reporter: Duo Zhang > Fix For: 2.0.0, 1.4.0 > > > We know that major compaction will actually delete the cells which are > deleted by a delete marker. In order to give a consistent view for a scan, we > need to use a map to track the read points for all scanners for a region, and > the smallest one will be used for a compaction. For all delete markers whose > mvcc is greater than this value, we will not use it to delete other cells. > And the problem for a scan restart after region move is that, the new RS does > not have the information of the scanners opened at the old RS before the > client sends scan requests to the new RS which means the read points map is > incomplete and the smallest read point maybe greater than the correct value. > So if a major compaction happens at that time, it may delete some cells which > should be kept. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17177) Major compaction can break the region/row level atomic when scan even if we pass mvcc to client
[ https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713934#comment-15713934 ] Duo Zhang commented on HBASE-17177: --- Have been thinking this for days. I think we should have an option for scan called 'atomicity' which has three values: {{None}}, {{Row}} and {{Region}}. The default value wil be {{Row}}. And this will change the way of error handling at client side. For {{None}}, in general we can recover from any exceptions by reopening a new region scanner, unless timeout. For {{Row}}, if allowPartial is enabled and we failed at the middle of a row, then it is not always safe to reopen a new scanner. We need to do something at the server side. If we get open new scanner request that have a mvcc read point at RS side, then we need to check if the read point is larger than or equals to the current smallest read point, or we are in the 'no major compaction period' introduced above, if not we need to tell client that the atomicity can not be guaranteed and you need to give up. For {{Region}}, the above thing will also happen even if allowPartial is disabled as we need cross row atomicity. And I think the {{None}} here is the same thing of 'stateless' in HBASE-15576. Thanks. > Major compaction can break the region/row level atomic when scan even if we > pass mvcc to client > --- > > Key: HBASE-17177 > URL: https://issues.apache.org/jira/browse/HBASE-17177 > Project: HBase > Issue Type: Sub-task > Components: scan >Reporter: Duo Zhang > Fix For: 2.0.0, 1.4.0 > > > We know that major compaction will actually delete the cells which are > deleted by a delete marker. In order to give a consistent view for a scan, we > need to use a map to track the read points for all scanners for a region, and > the smallest one will be used for a compaction. For all delete markers whose > mvcc is greater than this value, we will not use it to delete other cells. > And the problem for a scan restart after region move is that, the new RS does > not have the information of the scanners opened at the old RS before the > client sends scan requests to the new RS which means the read points map is > incomplete and the smallest read point maybe greater than the correct value. > So if a major compaction happens at that time, it may delete some cells which > should be kept. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17177) Major compaction can break the region/row level atomic when scan even if we pass mvcc to client
[ https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15701071#comment-15701071 ] ramkrishna.s.vasudevan commented on HBASE-17177: bq.I do not think a region can be moved if there is major compaction running for its storefiles? Then its fine. I know this is how it was long back but off late I am not aware of the changes to the assignments/region movements. So it is fine then. > Major compaction can break the region/row level atomic when scan even if we > pass mvcc to client > --- > > Key: HBASE-17177 > URL: https://issues.apache.org/jira/browse/HBASE-17177 > Project: HBase > Issue Type: Sub-task > Components: scan >Reporter: Duo Zhang > Fix For: 2.0.0, 1.4.0 > > > We know that major compaction will actually delete the cells which are > deleted by a delete marker. In order to give a consistent view for a scan, we > need to use a map to track the read points for all scanners for a region, and > the smallest one will be used for a compaction. For all delete markers whose > mvcc is greater than this value, we will not use it to delete other cells. > And the problem for a scan restart after region move is that, the new RS does > not have the information of the scanners opened at the old RS before the > client sends scan requests to the new RS which means the read points map is > incomplete and the smallest read point maybe greater than the correct value. > So if a major compaction happens at that time, it may delete some cells which > should be kept. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17177) Major compaction can break the region/row level atomic when scan even if we pass mvcc to client
[ https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15701061#comment-15701061 ] Duo Zhang commented on HBASE-17177: --- Sorry I do not get your point. I do not think a region can be moved if there is major compaction running for its storefiles? > Major compaction can break the region/row level atomic when scan even if we > pass mvcc to client > --- > > Key: HBASE-17177 > URL: https://issues.apache.org/jira/browse/HBASE-17177 > Project: HBase > Issue Type: Sub-task > Components: scan >Reporter: Duo Zhang > Fix For: 2.0.0, 1.4.0 > > > We know that major compaction will actually delete the cells which are > deleted by a delete marker. In order to give a consistent view for a scan, we > need to use a map to track the read points for all scanners for a region, and > the smallest one will be used for a compaction. For all delete markers whose > mvcc is greater than this value, we will not use it to delete other cells. > And the problem for a scan restart after region move is that, the new RS does > not have the information of the scanners opened at the old RS before the > client sends scan requests to the new RS which means the read points map is > incomplete and the smallest read point maybe greater than the correct value. > So if a major compaction happens at that time, it may delete some cells which > should be kept. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17177) Major compaction can break the region/row level atomic when scan even if we pass mvcc to client
[ https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15700974#comment-15700974 ] ramkrishna.s.vasudevan commented on HBASE-17177: If a major compaction had already started and that time a region move happens then we can still delay the major compaction of the region that got newly moved? May be we should complete the major compaction of other regions and then come back to this so that overall there is no delay in completing major compaction. > Major compaction can break the region/row level atomic when scan even if we > pass mvcc to client > --- > > Key: HBASE-17177 > URL: https://issues.apache.org/jira/browse/HBASE-17177 > Project: HBase > Issue Type: Sub-task > Components: scan >Reporter: Duo Zhang > Fix For: 2.0.0, 1.4.0 > > > We know that major compaction will actually delete the cells which are > deleted by a delete marker. In order to give a consistent view for a scan, we > need to use a map to track the read points for all scanners for a region, and > the smallest one will be used for a compaction. For all delete markers whose > mvcc is greater than this value, we will not use it to delete other cells. > And the problem for a scan restart after region move is that, the new RS does > not have the information of the scanners opened at the old RS before the > client sends scan requests to the new RS which means the read points map is > incomplete and the smallest read point maybe greater than the correct value. > So if a major compaction happens at that time, it may delete some cells which > should be kept. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17177) Major compaction can break the region/row level atomic when scan even if we pass mvcc to client
[ https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15698034#comment-15698034 ] Duo Zhang commented on HBASE-17177: --- [~stack] [~ram_krish] Is the description clear enough? And for the solution, maybe we could disable major compaction for a small amount when the region is just online? Maybe several minutes? Thanks. > Major compaction can break the region/row level atomic when scan even if we > pass mvcc to client > --- > > Key: HBASE-17177 > URL: https://issues.apache.org/jira/browse/HBASE-17177 > Project: HBase > Issue Type: Sub-task > Components: scan >Reporter: Duo Zhang > Fix For: 2.0.0, 1.4.0 > > > We know that major compaction will actually delete the cells which are > deleted by a delete marker. In order to give a consistent view for a scan, we > need to use a map to track the read points for all scanners for a region, and > the smallest one will be used for a compaction. For all delete markers whose > mvcc is greater than this value, we will not use it to delete other cells. > And the problem for a scan restart after region move is that, the new RS does > not have the information of the scanners opened at the old RS before the > client sends scan requests to the new RS which means the read points map is > incomplete and the smallest read point maybe greater than the correct value. > So if a major compaction happens at that time, it may delete some cells which > should be keeped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)