[jira] [Commented] (HBASE-17177) Compaction can break the region/row level atomic when scan even if we pass mvcc to client

2018-01-03 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309321#comment-16309321
 ] 

Duo Zhang commented on HBASE-17177:
---

Do not get enough time to do this...

The reason we need this is that, when updating meta, we may use batchMutate to 
make sure that the operations on multiple rows are atomic, for example, we 
splitting or merging table. But when scanning, we can only guarantee row level 
atomicity, which may lead to inconsistent result.

I think a temporary maybe that, introduce a flag for the ClientScanner to 
disable restart of a remote scanner, if we hit an exception which indicate that 
we need to restart a remote scanner, then we just fail the scan and tell the 
upper layer. We can enable this flag when doing critical meta scanning.

Thanks.

> Compaction can break the region/row level atomic when scan even if we pass 
> mvcc to client
> -
>
> Key: HBASE-17177
> URL: https://issues.apache.org/jira/browse/HBASE-17177
> Project: HBase
>  Issue Type: Sub-task
>  Components: scan
>Reporter: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0
>
>
> We know that major compaction will actually delete the cells which are 
> deleted by a delete marker. In order to give a consistent view for a scan, we 
> need to use a map to track the read points for all scanners for a region, and 
> the smallest one will be used for a compaction. For all delete markers whose 
> mvcc is greater than this value, we will not use it to delete other cells.
> And the problem for a scan restart after region move is that, the new RS does 
> not have the information of the scanners opened at the old RS before the 
> client sends scan requests to the new RS which means the read points map is 
> incomplete and the smallest read point maybe greater than the correct value. 
> So if a major compaction happens at that time, it may delete some cells which 
> should be kept.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17177) Compaction can break the region/row level atomic when scan even if we pass mvcc to client

2017-11-01 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234024#comment-16234024
 ] 

Duo Zhang commented on HBASE-17177:
---

I think we should get this in. Will take a look at this.

Thanks.

> Compaction can break the region/row level atomic when scan even if we pass 
> mvcc to client
> -
>
> Key: HBASE-17177
> URL: https://issues.apache.org/jira/browse/HBASE-17177
> Project: HBase
>  Issue Type: Sub-task
>  Components: scan
>Reporter: Duo Zhang
>Priority: Critical
> Fix For: 1.5.0, 2.0.0-beta-1
>
>
> We know that major compaction will actually delete the cells which are 
> deleted by a delete marker. In order to give a consistent view for a scan, we 
> need to use a map to track the read points for all scanners for a region, and 
> the smallest one will be used for a compaction. For all delete markers whose 
> mvcc is greater than this value, we will not use it to delete other cells.
> And the problem for a scan restart after region move is that, the new RS does 
> not have the information of the scanners opened at the old RS before the 
> client sends scan requests to the new RS which means the read points map is 
> incomplete and the smallest read point maybe greater than the correct value. 
> So if a major compaction happens at that time, it may delete some cells which 
> should be kept.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17177) Compaction can break the region/row level atomic when scan even if we pass mvcc to client

2017-10-31 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227634#comment-16227634
 ] 

stack commented on HBASE-17177:
---

What you reckon for this and hbase2 [~Apache9]

> Compaction can break the region/row level atomic when scan even if we pass 
> mvcc to client
> -
>
> Key: HBASE-17177
> URL: https://issues.apache.org/jira/browse/HBASE-17177
> Project: HBase
>  Issue Type: Sub-task
>  Components: scan
>Reporter: Duo Zhang
>Priority: Critical
> Fix For: 1.5.0, 2.0.0-beta-1
>
>
> We know that major compaction will actually delete the cells which are 
> deleted by a delete marker. In order to give a consistent view for a scan, we 
> need to use a map to track the read points for all scanners for a region, and 
> the smallest one will be used for a compaction. For all delete markers whose 
> mvcc is greater than this value, we will not use it to delete other cells.
> And the problem for a scan restart after region move is that, the new RS does 
> not have the information of the scanners opened at the old RS before the 
> client sends scan requests to the new RS which means the read points map is 
> incomplete and the smallest read point maybe greater than the correct value. 
> So if a major compaction happens at that time, it may delete some cells which 
> should be kept.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17177) Compaction can break the region/row level atomic when scan even if we pass mvcc to client

2016-12-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15717525#comment-15717525
 ] 

stack commented on HBASE-17177:
---

I like your idea that we do keep deleted cells if a compaction runs inside the 
scanner timeout after open (and a minor cannot graduate to major).

How we make this cryptic behavior 'obvious' to the operator?

> Compaction can break the region/row level atomic when scan even if we pass 
> mvcc to client
> -
>
> Key: HBASE-17177
> URL: https://issues.apache.org/jira/browse/HBASE-17177
> Project: HBase
>  Issue Type: Sub-task
>  Components: scan
>Reporter: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
>
> We know that major compaction will actually delete the cells which are 
> deleted by a delete marker. In order to give a consistent view for a scan, we 
> need to use a map to track the read points for all scanners for a region, and 
> the smallest one will be used for a compaction. For all delete markers whose 
> mvcc is greater than this value, we will not use it to delete other cells.
> And the problem for a scan restart after region move is that, the new RS does 
> not have the information of the scanners opened at the old RS before the 
> client sends scan requests to the new RS which means the read points map is 
> incomplete and the smallest read point maybe greater than the correct value. 
> So if a major compaction happens at that time, it may delete some cells which 
> should be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17177) Compaction can break the region/row level atomic when scan even if we pass mvcc to client

2016-12-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15717522#comment-15717522
 ] 

stack commented on HBASE-17177:
---

We already note if a file is product of a major compaction. As you suggest, we 
could add readpoint to the hfile metadata.

> Compaction can break the region/row level atomic when scan even if we pass 
> mvcc to client
> -
>
> Key: HBASE-17177
> URL: https://issues.apache.org/jira/browse/HBASE-17177
> Project: HBase
>  Issue Type: Sub-task
>  Components: scan
>Reporter: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
>
> We know that major compaction will actually delete the cells which are 
> deleted by a delete marker. In order to give a consistent view for a scan, we 
> need to use a map to track the read points for all scanners for a region, and 
> the smallest one will be used for a compaction. For all delete markers whose 
> mvcc is greater than this value, we will not use it to delete other cells.
> And the problem for a scan restart after region move is that, the new RS does 
> not have the information of the scanners opened at the old RS before the 
> client sends scan requests to the new RS which means the read points map is 
> incomplete and the smallest read point maybe greater than the correct value. 
> So if a major compaction happens at that time, it may delete some cells which 
> should be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17177) Compaction can break the region/row level atomic when scan even if we pass mvcc to client

2016-12-01 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15714238#comment-15714238
 ] 

Duo Zhang commented on HBASE-17177:
---

Oh I made a mistake... Even minor compaction could also reclaim the delete 
cells, the difference of major compaction is that it can reclaim the delete 
marker itself...

So in general, we need to record a mvcc below which we may delete some cells 
and you may not read all the cells. And when a region is newly opened, we need 
to freeze this value for a small amount(maybe the scanner TTL as [~yangzhe1991] 
proposed above), either by disable compaction or set KeepDeleteCells to true 
when compaction.

Thanks.

> Compaction can break the region/row level atomic when scan even if we pass 
> mvcc to client
> -
>
> Key: HBASE-17177
> URL: https://issues.apache.org/jira/browse/HBASE-17177
> Project: HBase
>  Issue Type: Sub-task
>  Components: scan
>Reporter: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
>
> We know that major compaction will actually delete the cells which are 
> deleted by a delete marker. In order to give a consistent view for a scan, we 
> need to use a map to track the read points for all scanners for a region, and 
> the smallest one will be used for a compaction. For all delete markers whose 
> mvcc is greater than this value, we will not use it to delete other cells.
> And the problem for a scan restart after region move is that, the new RS does 
> not have the information of the scanners opened at the old RS before the 
> client sends scan requests to the new RS which means the read points map is 
> incomplete and the smallest read point maybe greater than the correct value. 
> So if a major compaction happens at that time, it may delete some cells which 
> should be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)