[jira] [Commented] (HBASE-17177) Major compaction can break the region/row level atomic when scan even if we pass mvcc to client

2016-12-01 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15714140#comment-15714140
 ] 

Phil Yang commented on HBASE-17177:
---

I think at first we should know if we can return a consistent view to a 
reopened scanner, no matter the region is moved or not. So we should record the 
minReadPoint of last major compaction and when we open a region we should also 
know it. We can add a filed to HFile's header and if it is generated by a major 
compaction this filed is the minReadPoint that the compaction used. After this 
we will know when a scanner comes, we can return a consistent view or not.

Now we have a TTL(hbase.client.scanner.timeout.period) for scanner in server. 
If there is no requests within TTL milliseconds, we can remove the scanner. So 
I think when we open a region, we can wait same time before we want to do a 
major compaction. Although the scanner may has been expired at former RS, it is 
safe and TTL is not a long time.

> Major compaction can break the region/row level atomic when scan even if we 
> pass mvcc to client
> ---
>
> Key: HBASE-17177
> URL: https://issues.apache.org/jira/browse/HBASE-17177
> Project: HBase
>  Issue Type: Sub-task
>  Components: scan
>Reporter: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
>
> We know that major compaction will actually delete the cells which are 
> deleted by a delete marker. In order to give a consistent view for a scan, we 
> need to use a map to track the read points for all scanners for a region, and 
> the smallest one will be used for a compaction. For all delete markers whose 
> mvcc is greater than this value, we will not use it to delete other cells.
> And the problem for a scan restart after region move is that, the new RS does 
> not have the information of the scanners opened at the old RS before the 
> client sends scan requests to the new RS which means the read points map is 
> incomplete and the smallest read point maybe greater than the correct value. 
> So if a major compaction happens at that time, it may delete some cells which 
> should be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17177) Major compaction can break the region/row level atomic when scan even if we pass mvcc to client

2016-12-01 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15714130#comment-15714130
 ] 

Duo Zhang commented on HBASE-17177:
---

{quote}
Not sure about NONE/ROW/REGION. Can we do REGION first, since mvcc is by 
region, and then if needed do ROW and NONE.
{quote}

NONE/ROW/REGION is the lower bound, if there is no error then we will always 
have the REGION level atomicity. The problem only happens when there is an 
error and we need to reopen a scanner. We will try our best to keep the REGION 
level atomicity but as said above, we can not always succeed. And if the bad 
things happen, then we will use the 'atomicity' option to determine if we can 
go on or throw an exception to user.

Thanks.

> Major compaction can break the region/row level atomic when scan even if we 
> pass mvcc to client
> ---
>
> Key: HBASE-17177
> URL: https://issues.apache.org/jira/browse/HBASE-17177
> Project: HBase
>  Issue Type: Sub-task
>  Components: scan
>Reporter: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
>
> We know that major compaction will actually delete the cells which are 
> deleted by a delete marker. In order to give a consistent view for a scan, we 
> need to use a map to track the read points for all scanners for a region, and 
> the smallest one will be used for a compaction. For all delete markers whose 
> mvcc is greater than this value, we will not use it to delete other cells.
> And the problem for a scan restart after region move is that, the new RS does 
> not have the information of the scanners opened at the old RS before the 
> client sends scan requests to the new RS which means the read points map is 
> incomplete and the smallest read point maybe greater than the correct value. 
> So if a major compaction happens at that time, it may delete some cells which 
> should be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17177) Major compaction can break the region/row level atomic when scan even if we pass mvcc to client

2016-12-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15714094#comment-15714094
 ] 

stack commented on HBASE-17177:
---

A region opens after a move, and a major compaction could start. It would look 
for smallest read point. There might be none so it would think it could clean 
up all deletes.

After, a restarted scan comes in with an mvcc that is older than current read 
point.

Region does not keep record of the mvcc that the last or current ongoing major 
compaction used. If it did, we could fail the scan if its mvcc was older than 
that of the major compaction.

Yeah, seems smart to delay major compaction until a good while after a region 
opens so restarted acanners have a chance of getting back in. Can we find a 
latch that is other than time based (Wait a few minutes)?

Compactions get promoted from minor to major if it happens that the minor 
compaction includes all hfiles. We'd have to undo this or not allow the upgrade.

Not sure about NONE/ROW/REGION. Can we do REGION first, since mvcc is by 
region, and then if needed do ROW and NONE.

This is an awkward problem. 

> Major compaction can break the region/row level atomic when scan even if we 
> pass mvcc to client
> ---
>
> Key: HBASE-17177
> URL: https://issues.apache.org/jira/browse/HBASE-17177
> Project: HBase
>  Issue Type: Sub-task
>  Components: scan
>Reporter: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
>
> We know that major compaction will actually delete the cells which are 
> deleted by a delete marker. In order to give a consistent view for a scan, we 
> need to use a map to track the read points for all scanners for a region, and 
> the smallest one will be used for a compaction. For all delete markers whose 
> mvcc is greater than this value, we will not use it to delete other cells.
> And the problem for a scan restart after region move is that, the new RS does 
> not have the information of the scanners opened at the old RS before the 
> client sends scan requests to the new RS which means the read points map is 
> incomplete and the smallest read point maybe greater than the correct value. 
> So if a major compaction happens at that time, it may delete some cells which 
> should be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17177) Major compaction can break the region/row level atomic when scan even if we pass mvcc to client

2016-12-01 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713934#comment-15713934
 ] 

Duo Zhang commented on HBASE-17177:
---

Have been thinking this for days. I think we should have an option for scan 
called 'atomicity' which has three values: {{None}}, {{Row}} and {{Region}}. 
The default value wil be {{Row}}.

And this will change the way of error handling at client side.

For {{None}}, in general we can recover from any exceptions by reopening a new 
region scanner, unless timeout.

For {{Row}}, if allowPartial is enabled and we failed at the middle of a row, 
then it is not always safe to reopen a new scanner. We need to do something at 
the server side. If we get open new scanner request that have a mvcc read point 
at RS side, then we need to check if the read point is larger than or equals to 
the current smallest read point, or we are in the 'no major compaction period' 
introduced above, if not we need to tell client that the atomicity can not be 
guaranteed and you need to give up.

For {{Region}}, the above thing will also happen even if allowPartial is 
disabled as we need cross row atomicity.

And I think the {{None}} here is the same thing of 'stateless' in HBASE-15576.

Thanks.

> Major compaction can break the region/row level atomic when scan even if we 
> pass mvcc to client
> ---
>
> Key: HBASE-17177
> URL: https://issues.apache.org/jira/browse/HBASE-17177
> Project: HBase
>  Issue Type: Sub-task
>  Components: scan
>Reporter: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
>
> We know that major compaction will actually delete the cells which are 
> deleted by a delete marker. In order to give a consistent view for a scan, we 
> need to use a map to track the read points for all scanners for a region, and 
> the smallest one will be used for a compaction. For all delete markers whose 
> mvcc is greater than this value, we will not use it to delete other cells.
> And the problem for a scan restart after region move is that, the new RS does 
> not have the information of the scanners opened at the old RS before the 
> client sends scan requests to the new RS which means the read points map is 
> incomplete and the smallest read point maybe greater than the correct value. 
> So if a major compaction happens at that time, it may delete some cells which 
> should be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17177) Major compaction can break the region/row level atomic when scan even if we pass mvcc to client

2016-11-27 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15701071#comment-15701071
 ] 

ramkrishna.s.vasudevan commented on HBASE-17177:


bq.I do not think a region can be moved if there is major compaction running 
for its storefiles?
Then its fine. I know this is how it was long back but off late I am not aware 
of the changes to the assignments/region movements. So it is fine then.

> Major compaction can break the region/row level atomic when scan even if we 
> pass mvcc to client
> ---
>
> Key: HBASE-17177
> URL: https://issues.apache.org/jira/browse/HBASE-17177
> Project: HBase
>  Issue Type: Sub-task
>  Components: scan
>Reporter: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
>
> We know that major compaction will actually delete the cells which are 
> deleted by a delete marker. In order to give a consistent view for a scan, we 
> need to use a map to track the read points for all scanners for a region, and 
> the smallest one will be used for a compaction. For all delete markers whose 
> mvcc is greater than this value, we will not use it to delete other cells.
> And the problem for a scan restart after region move is that, the new RS does 
> not have the information of the scanners opened at the old RS before the 
> client sends scan requests to the new RS which means the read points map is 
> incomplete and the smallest read point maybe greater than the correct value. 
> So if a major compaction happens at that time, it may delete some cells which 
> should be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17177) Major compaction can break the region/row level atomic when scan even if we pass mvcc to client

2016-11-27 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15701061#comment-15701061
 ] 

Duo Zhang commented on HBASE-17177:
---

Sorry I do not get your point. I do not think a region can be moved if there is 
major compaction running for its storefiles?

> Major compaction can break the region/row level atomic when scan even if we 
> pass mvcc to client
> ---
>
> Key: HBASE-17177
> URL: https://issues.apache.org/jira/browse/HBASE-17177
> Project: HBase
>  Issue Type: Sub-task
>  Components: scan
>Reporter: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
>
> We know that major compaction will actually delete the cells which are 
> deleted by a delete marker. In order to give a consistent view for a scan, we 
> need to use a map to track the read points for all scanners for a region, and 
> the smallest one will be used for a compaction. For all delete markers whose 
> mvcc is greater than this value, we will not use it to delete other cells.
> And the problem for a scan restart after region move is that, the new RS does 
> not have the information of the scanners opened at the old RS before the 
> client sends scan requests to the new RS which means the read points map is 
> incomplete and the smallest read point maybe greater than the correct value. 
> So if a major compaction happens at that time, it may delete some cells which 
> should be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17177) Major compaction can break the region/row level atomic when scan even if we pass mvcc to client

2016-11-27 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15700974#comment-15700974
 ] 

ramkrishna.s.vasudevan commented on HBASE-17177:


If a major compaction had already started and that time a region move happens 
then we can still delay the major compaction of the region that got newly 
moved? 
May be we should complete the major compaction of other regions and then come 
back to this so that overall there is no delay in completing major compaction. 

> Major compaction can break the region/row level atomic when scan even if we 
> pass mvcc to client
> ---
>
> Key: HBASE-17177
> URL: https://issues.apache.org/jira/browse/HBASE-17177
> Project: HBase
>  Issue Type: Sub-task
>  Components: scan
>Reporter: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
>
> We know that major compaction will actually delete the cells which are 
> deleted by a delete marker. In order to give a consistent view for a scan, we 
> need to use a map to track the read points for all scanners for a region, and 
> the smallest one will be used for a compaction. For all delete markers whose 
> mvcc is greater than this value, we will not use it to delete other cells.
> And the problem for a scan restart after region move is that, the new RS does 
> not have the information of the scanners opened at the old RS before the 
> client sends scan requests to the new RS which means the read points map is 
> incomplete and the smallest read point maybe greater than the correct value. 
> So if a major compaction happens at that time, it may delete some cells which 
> should be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17177) Major compaction can break the region/row level atomic when scan even if we pass mvcc to client

2016-11-26 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15698034#comment-15698034
 ] 

Duo Zhang commented on HBASE-17177:
---

[~stack] [~ram_krish] Is the description clear enough?

And for the solution, maybe we could disable major compaction for a small 
amount when the region is just online?  Maybe several minutes?

Thanks. 

> Major compaction can break the region/row level atomic when scan even if we 
> pass mvcc to client
> ---
>
> Key: HBASE-17177
> URL: https://issues.apache.org/jira/browse/HBASE-17177
> Project: HBase
>  Issue Type: Sub-task
>  Components: scan
>Reporter: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
>
> We know that major compaction will actually delete the cells which are 
> deleted by a delete marker. In order to give a consistent view for a scan, we 
> need to use a map to track the read points for all scanners for a region, and 
> the smallest one will be used for a compaction. For all delete markers whose 
> mvcc is greater than this value, we will not use it to delete other cells.
> And the problem for a scan restart after region move is that, the new RS does 
> not have the information of the scanners opened at the old RS before the 
> client sends scan requests to the new RS which means the read points map is 
> incomplete and the smallest read point maybe greater than the correct value. 
> So if a major compaction happens at that time, it may delete some cells which 
> should be keeped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)