[jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction

2016-03-05 Thread Jianwei Cui (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15181663#comment-15181663
 ] 

Jianwei Cui commented on HBASE-15340:
-

{quote}
The solution of having a client aware readPnt will solve even that(?)
{quote}
It seems [HBASE-13099|https://issues.apache.org/jira/browse/HBASE-13099] has 
proposed such solution: 
https://issues.apache.org/jira/browse/HBASE-13099?focusedCommentId=14337017=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14337017.
 However, there are cases the solution can't cover(if I am not wrong). For 
example:
1. the client holds the readPoint when the scanner is created on serverA and 
the client has read partial row data from serverA
2. move the region to another serverB before the whole row returned
3. before the client created a new scanner for the row with the readPoint on 
serverB: new mutations applied to the region, including deletes for the row, 
and a major compaction happens and completed.
The major compaction could delete the cells of the row because the new server 
can't get a proper smallestReadPoint for the compaction before all ongoing scan 
requests arrived. Then, the client can not read the remaining cells of the row 
after the compaction, and will break per-row atomicity for scan. 

> Partial row result of scan may return data violates the row-level transaction 
> --
>
> Key: HBASE-15340
> URL: https://issues.apache.org/jira/browse/HBASE-15340
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners, Transactions/MVCC
>Affects Versions: 2.0.0
>Reporter: Jianwei Cui
>
> There are cases the region sever will return partial row result, such as the 
> client set batch for scan or configured size limit reached. In these 
> situations, the client may return data that violates the row-level 
> transaction to the application. The following steps show the problem:
> {code}
> // assume there is a test table 'test_table' with one family 'F' and one 
> region 'region'. 
> // meanwhile there are two region servers 'rsA' and 'rsB'.
> 1. Let 'region' firstly located in 'rsA' and put one row with two columns 
> 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1'
> 2. Start a client to scan 'test_table', with scan.setBatch(1) and 
> scan.setCaching(1). The client will get one column as : {column='F:c1' and 
> value='value1'} in the first rpc call after scanner created, and the result 
> will be returned to application.
> 3. Before the client issues the next request, the 'region' was moved to 'rsB' 
> and accepted another mutations for the two columns 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2'
> 4. Then, the client  will receive a RegionMovedException when issuing next 
> request and will retry to open scanner on 'rsB'. The newly opened scanner 
> will higher mvcc than old data so that could read out column as : { 
> column='F:c2' with value='value2'} and return the result to application.
>Therefore, the application will get data as:
> 'row'column='F:c1'   value='value1'
> 'row'column='F:c2',  value='value2'
>The returned data is combined from two different mutations and violates 
> the row-level transaction.
> {code}
> The reason is that the newly opened scanner after region moved will get a 
> different mvcc. I am not sure whether this result is by design for scan if 
> partial row result is allowed. However, such row result combined from 
> different transactions may make the application have unexpected state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction

2016-02-26 Thread Jianwei Cui (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168832#comment-15168832
 ] 

Jianwei Cui commented on HBASE-15340:
-

{quote}
The solution of having a client aware readPnt will solve even that(?)
{quota}
It seems work IMO, I will try to find whether there is any discussion about 
this issue.

> Partial row result of scan may return data violates the row-level transaction 
> --
>
> Key: HBASE-15340
> URL: https://issues.apache.org/jira/browse/HBASE-15340
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners, Transactions/MVCC
>Affects Versions: 2.0.0
>Reporter: Jianwei Cui
>
> There are cases the region sever will return partial row result, such as the 
> client set batch for scan or configured size limit reached. In these 
> situations, the client may return data that violates the row-level 
> transaction to the application. The following steps show the problem:
> {code}
> // assume there is a test table 'test_table' with one family 'F' and one 
> region 'region'. 
> // meanwhile there are two region servers 'rsA' and 'rsB'.
> 1. Let 'region' firstly located in 'rsA' and put one row with two columns 
> 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1'
> 2. Start a client to scan 'test_table', with scan.setBatch(1) and 
> scan.setCaching(1). The client will get one column as : {column='F:c1' and 
> value='value1'} in the first rpc call after scanner created, and the result 
> will be returned to application.
> 3. Before the client issues the next request, the 'region' was moved to 'rsB' 
> and accepted another mutations for the two columns 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2'
> 4. Then, the client  will receive a RegionMovedException when issuing next 
> request and will retry to open scanner on 'rsB'. The newly opened scanner 
> will higher mvcc than old data so that could read out column as : { 
> column='F:c2' with value='value2'} and return the result to application.
>Therefore, the application will get data as:
> 'row'column='F:c1'   value='value1'
> 'row'column='F:c2',  value='value2'
>The returned data is combined from two different mutations and violates 
> the row-level transaction.
> {code}
> The reason is that the newly opened scanner after region moved will get a 
> different mvcc. I am not sure whether this result is by design for scan if 
> partial row result is allowed. However, such row result combined from 
> different transactions may make the application have unexpected state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction

2016-02-26 Thread Jianwei Cui (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168829#comment-15168829
 ] 

Jianwei Cui commented on HBASE-15340:
-

After [HBASE-11544|https://issues.apache.org/jira/browse/HBASE-11544], the 
maxScannerResultSize of ClientScanner will be 2MB default, this will make 
server return partial result more easily when size limit reached, and this 
issue will happen even when the user not set batch for scan.  

> Partial row result of scan may return data violates the row-level transaction 
> --
>
> Key: HBASE-15340
> URL: https://issues.apache.org/jira/browse/HBASE-15340
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners, Transactions/MVCC
>Affects Versions: 2.0.0
>Reporter: Jianwei Cui
>
> There are cases the region sever will return partial row result, such as the 
> client set batch for scan or configured size limit reached. In these 
> situations, the client may return data that violates the row-level 
> transaction to the application. The following steps show the problem:
> {code}
> // assume there is a test table 'test_table' with one family 'F' and one 
> region 'region'. 
> // meanwhile there are two region servers 'rsA' and 'rsB'.
> 1. Let 'region' firstly located in 'rsA' and put one row with two columns 
> 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1'
> 2. Start a client to scan 'test_table', with scan.setBatch(1) and 
> scan.setCaching(1). The client will get one column as : {column='F:c1' and 
> value='value1'} in the first rpc call after scanner created, and the result 
> will be returned to application.
> 3. Before the client issues the next request, the 'region' was moved to 'rsB' 
> and accepted another mutations for the two columns 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2'
> 4. Then, the client  will receive a RegionMovedException when issuing next 
> request and will retry to open scanner on 'rsB'. The newly opened scanner 
> will higher mvcc than old data so that could read out column as : { 
> column='F:c2' with value='value2'} and return the result to application.
>Therefore, the application will get data as:
> 'row'column='F:c1'   value='value1'
> 'row'column='F:c2',  value='value2'
>The returned data is combined from two different mutations and violates 
> the row-level transaction.
> {code}
> The reason is that the newly opened scanner after region moved will get a 
> different mvcc. I am not sure whether this result is by design for scan if 
> partial row result is allowed. However, such row result combined from 
> different transactions may make the application have unexpected state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction

2016-02-26 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168785#comment-15168785
 ] 

Anoop Sam John commented on HBASE-15340:


Yep. This is a known issue then..  The solution of having a client aware 
readPnt will solve even that (?)  That work has to consider comparability as 
well. old client -> new RS and reverse.

> Partial row result of scan may return data violates the row-level transaction 
> --
>
> Key: HBASE-15340
> URL: https://issues.apache.org/jira/browse/HBASE-15340
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners, Transactions/MVCC
>Affects Versions: 2.0.0
>Reporter: Jianwei Cui
>
> There are cases the region sever will return partial row result, such as the 
> client set batch for scan or configured size limit reached. In these 
> situations, the client may return data that violates the row-level 
> transaction to the application. The following steps show the problem:
> {code}
> // assume there is a test table 'test_table' with one family 'F' and one 
> region 'region'. 
> // meanwhile there are two region servers 'rsA' and 'rsB'.
> 1. Let 'region' firstly located in 'rsA' and put one row with two columns 
> 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1'
> 2. Start a client to scan 'test_table', with scan.setBatch(1) and 
> scan.setCaching(1). The client will get one column as : {column='F:c1' and 
> value='value1'} in the first rpc call after scanner created, and the result 
> will be returned to application.
> 3. Before the client issues the next request, the 'region' was moved to 'rsB' 
> and accepted another mutations for the two columns 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2'
> 4. Then, the client  will receive a RegionMovedException when issuing next 
> request and will retry to open scanner on 'rsB'. The newly opened scanner 
> will higher mvcc than old data so that could read out column as : { 
> column='F:c2' with value='value2'} and return the result to application.
>Therefore, the application will get data as:
> 'row'column='F:c1'   value='value1'
> 'row'column='F:c2',  value='value2'
>The returned data is combined from two different mutations and violates 
> the row-level transaction.
> {code}
> The reason is that the newly opened scanner after region moved will get a 
> different mvcc. I am not sure whether this result is by design for scan if 
> partial row result is allowed. However, such row result combined from 
> different transactions may make the application have unexpected state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction

2016-02-26 Thread Jianwei Cui (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168771#comment-15168771
 ] 

Jianwei Cui commented on HBASE-15340:
-

[~anoop.hbase], thanks for your comment, I get your point:). Yes, the case you 
mentioned will happen. The page https://hbase.apache.org/acid-semantics.html 
explains the consistency guarantee for scan:
{code}
A scan is not a consistent view of a table. Scans do not exhibit snapshot 
isolation.

Rather, scans have the following properties:

1. Any row returned by the scan will be a consistent view (i.e. that version of 
the complete row existed at some point in time) [1]
2. A scan will always reflect a view of the data at least as new as the 
beginning of the scan. This satisfies the visibility guarantees enumerated 
below.
1. For example, if client A writes data X and then communicates via a side 
channel to client B, any scans started by client B will contain data at least 
as new as X.
2. A scan _must_ reflect all mutations committed prior to the construction 
of the scanner, and _may_ reflect some mutations committed subsequent to the 
construction of the scanner.
3. Scans must include all data written prior to the scan (except in the 
case where data is subsequently mutated, in which case it _may_ reflect the 
mutation)
{code}
It seems the consistent for scan only guarantee to read out data at least as 
new as the beginning of the scan, but no guarantee to whether read out data 
concurrently written or written after the beginning of the scan. 

At the end of the page:
{code}
[1] A consistent view is not guaranteed intra-row scanning -- i.e. fetching a 
portion of a row in one RPC then going back to fetch another portion of the row 
in a subsequent RPC. Intra-row scanning happens when you set a limit on how 
many values to return per Scan#next (See Scan#setBatch(int)).
{code}
It mentioned the problem of this jira that row-level consistent view is not 
guaranteed for intra-row scanning, so this is a known problem?

> Partial row result of scan may return data violates the row-level transaction 
> --
>
> Key: HBASE-15340
> URL: https://issues.apache.org/jira/browse/HBASE-15340
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners, Transactions/MVCC
>Affects Versions: 2.0.0
>Reporter: Jianwei Cui
>
> There are cases the region sever will return partial row result, such as the 
> client set batch for scan or configured size limit reached. In these 
> situations, the client may return data that violates the row-level 
> transaction to the application. The following steps show the problem:
> {code}
> // assume there is a test table 'test_table' with one family 'F' and one 
> region 'region'. 
> // meanwhile there are two region servers 'rsA' and 'rsB'.
> 1. Let 'region' firstly located in 'rsA' and put one row with two columns 
> 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1'
> 2. Start a client to scan 'test_table', with scan.setBatch(1) and 
> scan.setCaching(1). The client will get one column as : {column='F:c1' and 
> value='value1'} in the first rpc call after scanner created, and the result 
> will be returned to application.
> 3. Before the client issues the next request, the 'region' was moved to 'rsB' 
> and accepted another mutations for the two columns 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2'
> 4. Then, the client  will receive a RegionMovedException when issuing next 
> request and will retry to open scanner on 'rsB'. The newly opened scanner 
> will higher mvcc than old data so that could read out column as : { 
> column='F:c2' with value='value2'} and return the result to application.
>Therefore, the application will get data as:
> 'row'column='F:c1'   value='value1'
> 'row'column='F:c2',  value='value2'
>The returned data is combined from two different mutations and violates 
> the row-level transaction.
> {code}
> The reason is that the newly opened scanner after region moved will get a 
> different mvcc. I am not sure whether this result is by design for scan if 
> partial row result is allowed. However, such row result combined from 
> different transactions may make the application have unexpected state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction

2016-02-26 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168742#comment-15168742
 ] 

Anoop Sam John commented on HBASE-15340:


Not just intra row I would say.  Even consider a normal Scan. We have writes 
also in parallel.  A row 'r5' (say only one cell in it ) is inserted after 
begin of the scan.  So if there is no region move in btw, we wont see this row 
at all. The cell will get removed from the return result by the seqId check 
against the readPnt.  But if there is a region move in btw, we may see it.   So 
it is a Q of consistency wrt results as well.  Get my point?  Just saying..
With intra row results (By setting batch on Scan/ result chunking)  this got to 
be more visible issue

> Partial row result of scan may return data violates the row-level transaction 
> --
>
> Key: HBASE-15340
> URL: https://issues.apache.org/jira/browse/HBASE-15340
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners, Transactions/MVCC
>Affects Versions: 2.0.0
>Reporter: Jianwei Cui
>
> There are cases the region sever will return partial row result, such as the 
> client set batch for scan or configured size limit reached. In these 
> situations, the client may return data that violates the row-level 
> transaction to the application. The following steps show the problem:
> {code}
> // assume there is a test table 'test_table' with one family 'F' and one 
> region 'region'. 
> // meanwhile there are two region servers 'rsA' and 'rsB'.
> 1. Let 'region' firstly located in 'rsA' and put one row with two columns 
> 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1'
> 2. Start a client to scan 'test_table', with scan.setBatch(1) and 
> scan.setCaching(1). The client will get one column as : {column='F:c1' and 
> value='value1'} in the first rpc call after scanner created, and the result 
> will be returned to application.
> 3. Before the client issues the next request, the 'region' was moved to 'rsB' 
> and accepted another mutations for the two columns 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2'
> 4. Then, the client  will receive a RegionMovedException when issuing next 
> request and will retry to open scanner on 'rsB'. The newly opened scanner 
> will higher mvcc than old data so that could read out column as : { 
> column='F:c2' with value='value2'} and return the result to application.
>Therefore, the application will get data as:
> 'row'column='F:c1'   value='value1'
> 'row'column='F:c2',  value='value2'
>The returned data is combined from two different mutations and violates 
> the row-level transaction.
> {code}
> The reason is that the newly opened scanner after region moved will get a 
> different mvcc. I am not sure whether this result is by design for scan if 
> partial row result is allowed. However, such row result combined from 
> different transactions may make the application have unexpected state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction

2016-02-26 Thread Jianwei Cui (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168736#comment-15168736
 ] 

Jianwei Cui commented on HBASE-15340:
-

[~anoop.hbase], the intra-row scanning seems come from 
[HBASE-1537|https://issues.apache.org/jira/browse/HBASE-1537], so that versions 
after 0.90.0 will have this issue. I will make a patch following the idea and 
check the result:)

> Partial row result of scan may return data violates the row-level transaction 
> --
>
> Key: HBASE-15340
> URL: https://issues.apache.org/jira/browse/HBASE-15340
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners, Transactions/MVCC
>Affects Versions: 2.0.0
>Reporter: Jianwei Cui
>
> There are cases the region sever will return partial row result, such as the 
> client set batch for scan or configured size limit reached. In these 
> situations, the client may return data that violates the row-level 
> transaction to the application. The following steps show the problem:
> {code}
> // assume there is a test table 'test_table' with one family 'F' and one 
> region 'region'. 
> // meanwhile there are two region servers 'rsA' and 'rsB'.
> 1. Let 'region' firstly located in 'rsA' and put one row with two columns 
> 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1'
> 2. Start a client to scan 'test_table', with scan.setBatch(1) and 
> scan.setCaching(1). The client will get one column as : {column='F:c1' and 
> value='value1'} in the first rpc call after scanner created, and the result 
> will be returned to application.
> 3. Before the client issues the next request, the 'region' was moved to 'rsB' 
> and accepted another mutations for the two columns 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2'
> 4. Then, the client  will receive a RegionMovedException when issuing next 
> request and will retry to open scanner on 'rsB'. The newly opened scanner 
> will higher mvcc than old data so that could read out column as : { 
> column='F:c2' with value='value2'} and return the result to application.
>Therefore, the application will get data as:
> 'row'column='F:c1'   value='value1'
> 'row'column='F:c2',  value='value2'
>The returned data is combined from two different mutations and violates 
> the row-level transaction.
> {code}
> The reason is that the newly opened scanner after region moved will get a 
> different mvcc. I am not sure whether this result is by design for scan if 
> partial row result is allowed. However, such row result combined from 
> different transactions may make the application have unexpected state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction

2016-02-26 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168727#comment-15168727
 ] 

ramkrishna.s.vasudevan commented on HBASE-15340:


bq. When HBASE-15325 is resolved, there is no data miss, however, the returned 
data may combined from different row-level transactions which is unexpected for 
application. 
Ya got it now.

> Partial row result of scan may return data violates the row-level transaction 
> --
>
> Key: HBASE-15340
> URL: https://issues.apache.org/jira/browse/HBASE-15340
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners, Transactions/MVCC
>Affects Versions: 2.0.0
>Reporter: Jianwei Cui
>
> There are cases the region sever will return partial row result, such as the 
> client set batch for scan or configured size limit reached. In these 
> situations, the client may return data that violates the row-level 
> transaction to the application. The following steps show the problem:
> {code}
> // assume there is a test table 'test_table' with one family 'F' and one 
> region 'region'. 
> // meanwhile there are two region servers 'rsA' and 'rsB'.
> 1. Let 'region' firstly located in 'rsA' and put one row with two columns 
> 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1'
> 2. Start a client to scan 'test_table', with scan.setBatch(1) and 
> scan.setCaching(1). The client will get one column as : {column='F:c1' and 
> value='value1'} in the first rpc call after scanner created, and the result 
> will be returned to application.
> 3. Before the client issues the next request, the 'region' was moved to 'rsB' 
> and accepted another mutations for the two columns 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2'
> 4. Then, the client  will receive a RegionMovedException when issuing next 
> request and will retry to open scanner on 'rsB'. The newly opened scanner 
> will higher mvcc than old data so that could read out column as : { 
> column='F:c2' with value='value2'} and return the result to application.
>Therefore, the application will get data as:
> 'row'column='F:c1'   value='value1'
> 'row'column='F:c2',  value='value2'
>The returned data is combined from two different mutations and violates 
> the row-level transaction.
> {code}
> The reason is that the newly opened scanner after region moved will get a 
> different mvcc. I am not sure whether this result is by design for scan if 
> partial row result is allowed. However, such row result combined from 
> different transactions may make the application have unexpected state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction

2016-02-26 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168702#comment-15168702
 ] 

Anoop Sam John commented on HBASE-15340:


And this is an issue in all versions of HBase I think. From day one we have 
this issue (?)

> Partial row result of scan may return data violates the row-level transaction 
> --
>
> Key: HBASE-15340
> URL: https://issues.apache.org/jira/browse/HBASE-15340
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners, Transactions/MVCC
>Affects Versions: 2.0.0
>Reporter: Jianwei Cui
>
> There are cases the region sever will return partial row result, such as the 
> client set batch for scan or configured size limit reached. In these 
> situations, the client may return data that violates the row-level 
> transaction to the application. The following steps show the problem:
> {code}
> // assume there is a test table 'test_table' with one family 'F' and one 
> region 'region'. 
> // meanwhile there are two region servers 'rsA' and 'rsB'.
> 1. Let 'region' firstly located in 'rsA' and put one row with two columns 
> 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1'
> 2. Start a client to scan 'test_table', with scan.setBatch(1) and 
> scan.setCaching(1). The client will get one column as : {column='F:c1' and 
> value='value1'} in the first rpc call after scanner created, and the result 
> will be returned to application.
> 3. Before the client issues the next request, the 'region' was moved to 'rsB' 
> and accepted another mutations for the two columns 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2'
> 4. Then, the client  will receive a RegionMovedException when issuing next 
> request and will retry to open scanner on 'rsB'. The newly opened scanner 
> will higher mvcc than old data so that could read out column as : { 
> column='F:c2' with value='value2'} and return the result to application.
>Therefore, the application will get data as:
> 'row'column='F:c1'   value='value1'
> 'row'column='F:c2',  value='value2'
>The returned data is combined from two different mutations and violates 
> the row-level transaction.
> {code}
> The reason is that the newly opened scanner after region moved will get a 
> different mvcc. I am not sure whether this result is by design for scan if 
> partial row result is allowed. However, such row result combined from 
> different transactions may make the application have unexpected state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction

2016-02-26 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168700#comment-15168700
 ] 

Anoop Sam John commented on HBASE-15340:


After seeing an issue around partial results while region move yday, I was 
thinking on this ..   And the solution you mentioned only came first to my mind 
as well :-)Ya in case of client recreate scanner (because of NSRE or region 
moved) the ReadPoint MVCC stuff will get broken.  As the new Scanner will have 
a new readPnt.

> Partial row result of scan may return data violates the row-level transaction 
> --
>
> Key: HBASE-15340
> URL: https://issues.apache.org/jira/browse/HBASE-15340
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners, Transactions/MVCC
>Affects Versions: 2.0.0
>Reporter: Jianwei Cui
>
> There are cases the region sever will return partial row result, such as the 
> client set batch for scan or configured size limit reached. In these 
> situations, the client may return data that violates the row-level 
> transaction to the application. The following steps show the problem:
> {code}
> // assume there is a test table 'test_table' with one family 'F' and one 
> region 'region'. 
> // meanwhile there are two region servers 'rsA' and 'rsB'.
> 1. Let 'region' firstly located in 'rsA' and put one row with two columns 
> 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1'
> 2. Start a client to scan 'test_table', with scan.setBatch(1) and 
> scan.setCaching(1). The client will get one column as : {column='F:c1' and 
> value='value1'} in the first rpc call after scanner created, and the result 
> will be returned to application.
> 3. Before the client issues the next request, the 'region' was moved to 'rsB' 
> and accepted another mutations for the two columns 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2'
> 4. Then, the client  will receive a RegionMovedException when issuing next 
> request and will retry to open scanner on 'rsB'. The newly opened scanner 
> will higher mvcc than old data so that could read out column as : { 
> column='F:c2' with value='value2'} and return the result to application.
>Therefore, the application will get data as:
> 'row'column='F:c1'   value='value1'
> 'row'column='F:c2',  value='value2'
>The returned data is combined from two different mutations and violates 
> the row-level transaction.
> {code}
> The reason is that the newly opened scanner after region moved will get a 
> different mvcc. I am not sure whether this result is by design for scan if 
> partial row result is allowed. However, such row result combined from 
> different transactions may make the application have unexpected state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction

2016-02-26 Thread Jianwei Cui (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168678#comment-15168678
 ] 

Jianwei Cui commented on HBASE-15340:
-

[~ram_krish], this is a different problem caused by region move when scanning 
IMO. When [HBASE-15325|https://issues.apache.org/jira/browse/HBASE-15325] is 
resolved, there is no data miss, however, the returned data may combined from 
different row-level transactions which is unexpected for application. I think 
we should also keep the READ_COMMITTED isolation level in this situation?

> Partial row result of scan may return data violates the row-level transaction 
> --
>
> Key: HBASE-15340
> URL: https://issues.apache.org/jira/browse/HBASE-15340
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners, Transactions/MVCC
>Affects Versions: 2.0.0
>Reporter: Jianwei Cui
>
> There are cases the region sever will return partial row result, such as the 
> client set batch for scan or configured size limit reached. In these 
> situations, the client may return data that violates the row-level 
> transaction to the application. The following steps show the problem:
> {code}
> // assume there is a test table 'test_table' with one family 'F' and one 
> region 'region'. 
> // meanwhile there are two region servers 'rsA' and 'rsB'.
> 1. Let 'region' firstly located in 'rsA' and put one row with two columns 
> 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1'
> 2. Start a client to scan 'test_table', with scan.setBatch(1) and 
> scan.setCaching(1). The client will get one column as : {column='F:c1' and 
> value='value1'} in the first rpc call after scanner created, and the result 
> will be returned to application.
> 3. Before the client issues the next request, the 'region' was moved to 'rsB' 
> and accepted another mutations for the two columns 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2'
> 4. Then, the client  will receive a RegionMovedException when issuing next 
> request and will retry to open scanner on 'rsB'. The newly opened scanner 
> will higher mvcc than old data so that could read out column as : { 
> column='F:c2' with value='value2'} and return the result to application.
>Therefore, the application will get data as:
> 'row'column='F:c1'   value='value1'
> 'row'column='F:c2',  value='value2'
>The returned data is combined from two different mutations and violates 
> the row-level transaction.
> {code}
> The reason is that the newly opened scanner after region moved will get a 
> different mvcc. I am not sure whether this result is by design for scan if 
> partial row result is allowed. However, such row result combined from 
> different transactions may make the application have unexpected state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction

2016-02-26 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168642#comment-15168642
 ] 

ramkrishna.s.vasudevan commented on HBASE-15340:


Is this same as https://issues.apache.org/jira/browse/HBASE-15325?  Even there 
it talks about partial row results when the region moves.

> Partial row result of scan may return data violates the row-level transaction 
> --
>
> Key: HBASE-15340
> URL: https://issues.apache.org/jira/browse/HBASE-15340
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners, Transactions/MVCC
>Affects Versions: 2.0.0
>Reporter: Jianwei Cui
>
> There are cases the region sever will return partial row result, such as the 
> client set batch for scan or configured size limit reached. In these 
> situations, the client may return data that violates the row-level 
> transaction to the application. The following steps show the problem:
> {code}
> // assume there is a test table 'test_table' with one family 'F' and one 
> region 'region'. 
> // meanwhile there are two region servers 'rsA' and 'rsB'.
> 1. Let 'region' firstly located in 'rsA' and put one row with two columns 
> 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1'
> 2. Start a client to scan 'test_table', with scan.setBatch(1) and 
> scan.setCaching(1). The client will get one column as : {column='F:c1' and 
> value='value1'} in the first rpc call after scanner created, and the result 
> will be returned to application.
> 3. Before the client issues the next request, the 'region' was moved to 'rsB' 
> and accepted another mutations for the two columns 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2'
> 4. Then, the client  will receive a RegionMovedException when issuing next 
> request and will retry to open scanner on 'rsB'. The newly opened scanner 
> will higher mvcc than old data so that could read out column as : { 
> column='F:c2' with value='value2'} and return the result to application.
>Therefore, the application will get data as:
> 'row'column='F:c1'   value='value1'
> 'row'column='F:c2',  value='value2'
>The returned data is combined from two different mutations and violates 
> the row-level transaction.
> {code}
> The reason is that the newly opened scanner after region moved will get a 
> different mvcc. I am not sure whether this result is by design for scan if 
> partial row result is allowed. However, such row result combined from 
> different transactions may make the application have unexpected state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction

2016-02-26 Thread Jianwei Cui (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168630#comment-15168630
 ] 

Jianwei Cui commented on HBASE-15340:
-

A direct solution is that we can make ClientScanner record the readPoint when 
the scanner for the region is firstly opened, the following scanners for the 
same region use the same readPoint if RegionMovedException happens. Any 
suggestion? 

> Partial row result of scan may return data violates the row-level transaction 
> --
>
> Key: HBASE-15340
> URL: https://issues.apache.org/jira/browse/HBASE-15340
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners, Transactions/MVCC
>Affects Versions: 2.0.0
>Reporter: Jianwei Cui
>
> There are cases the region sever will return partial row result, such as the 
> client set batch for scan or configured size limit reached. In these 
> situations, the client may return data that violates the row-level 
> transaction to the application. The following steps show the problem:
> {code}
> // assume there is a test table 'test_table' with one family 'F' and one 
> region 'region'. 
> // meanwhile there are two region servers 'rsA' and 'rsB'.
> 1. Let 'region' firstly located in 'rsA' and put one row with two columns 
> 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1'
> 2. Start a client to scan 'test_table', with scan.setBatch(1) and 
> scan.setCaching(1). The client will get one column as : {column='F:c1' and 
> value='value1'} in the first rpc call after scanner created, and the result 
> will be returned to application.
> 3. Before the client issues the next request, the 'region' was moved to 'rsB' 
> and accepted another mutations for the two columns 'c1' and 'c2' as:
> > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2'
> 4. Then, the client  will receive a RegionMovedException when issuing next 
> request and will retry to open scanner on 'rsB'. The newly opened scanner 
> will higher mvcc than old data so that could read out column as : { 
> column='F:c2' with value='value2'} and return the result to application.
>Therefore, the application will get data as:
> 'row'column='F:c1'   value='value1'
> 'row'column='F:c2',  value='value2'
>The returned data is combined from two different mutations and violates 
> the row-level transaction.
> {code}
> The reason is that the newly opened scanner after region moved will get a 
> different mvcc. I am not sure whether this result is by design for scan if 
> partial row result is allowed. However, such row result combined from 
> different transactions may make the application have unexpected state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)