[jira] [Updated] (HBASE-16649) Truncate table with splits preserved can cause both data loss and truncated data appeared again

2016-09-26 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-16649:

   Resolution: Fixed
Fix Version/s: 1.2.4
   0.98.23
   1.1.7
   1.3.0
   2.0.0
   Status: Resolved  (was: Patch Available)

> Truncate table with splits preserved can cause both data loss and truncated 
> data appeared again
> ---
>
> Key: HBASE-16649
> URL: https://issues.apache.org/jira/browse/HBASE-16649
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.3
>Reporter: Allan Yang
>Assignee: Matteo Bertozzi
> Fix For: 2.0.0, 1.3.0, 1.1.7, 0.98.23, 1.2.4
>
> Attachments: HBASE-16649-v0.patch, HBASE-16649-v1.patch, 
> HBASE-16649-v2.patch
>
>
> Since truncate table with splits preserved will delete hfiles and use the 
> previous regioninfo. It can cause odd behaviors
> - Case 1: *Data appeared after truncate*
> reproduce procedure:
> 1. create a table, let's say 'test'
> 2. write data to 'test', make sure memstore of 'test' is not empty
> 3. truncate 'test' with splits preserved
> 4. kill the regionserver hosting the region(s) of 'test'
> 5. start the regionserver, now it is the time to witness the miracle! the 
> truncated data appeared in table 'test'
> - Case 2: *Data loss*
> reproduce procedure:
> 1. create a table, let's say 'test'
> 2. write some data to 'test', no matter how many
> 3. truncate 'test' with splits preserved
> 4. restart the regionserver to reset the seqid
> 5. write some data, but less than 2 since we don't want the seqid to run over 
> the one in 2
> 6. kill the regionserver hosting the region(s) of 'test'
> 7. restart the regionserver. Congratulations! the data writen in 4 is now all 
> lost
> *Why?*
> for case 1
> Since preserve splits in truncate table procedure will not change the 
> regioninfo, when log replay happens, the 'unflushed' data will restore back 
> to the region
> for case 2
> since the flushedSequenceIdByRegion are stored in Master in a map with the 
> region's encodedName. Although the table is truncated, the region's name is 
> not changed since we chose to preserve the splits. So after truncate the 
> table, the region's sequenceid is reset in the regionserver, but not reset in 
> master. When flush comes and report to master, master will reject the update 
> of sequenceid since the new one is smaller than the old one. The same happens 
> in log replay, all the edits writen in 4 will be skipped since they have a 
> smaller seqid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16649) Truncate table with splits preserved can cause both data loss and truncated data appeared again

2016-09-23 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-16649:

Attachment: HBASE-16649-v2.patch

> Truncate table with splits preserved can cause both data loss and truncated 
> data appeared again
> ---
>
> Key: HBASE-16649
> URL: https://issues.apache.org/jira/browse/HBASE-16649
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.3
>Reporter: Allan Yang
>Assignee: Matteo Bertozzi
> Attachments: HBASE-16649-v0.patch, HBASE-16649-v1.patch, 
> HBASE-16649-v2.patch
>
>
> Since truncate table with splits preserved will delete hfiles and use the 
> previous regioninfo. It can cause odd behaviors
> - Case 1: *Data appeared after truncate*
> reproduce procedure:
> 1. create a table, let's say 'test'
> 2. write data to 'test', make sure memstore of 'test' is not empty
> 3. truncate 'test' with splits preserved
> 4. kill the regionserver hosting the region(s) of 'test'
> 5. start the regionserver, now it is the time to witness the miracle! the 
> truncated data appeared in table 'test'
> - Case 2: *Data loss*
> reproduce procedure:
> 1. create a table, let's say 'test'
> 2. write some data to 'test', no matter how many
> 3. truncate 'test' with splits preserved
> 4. restart the regionserver to reset the seqid
> 5. write some data, but less than 2 since we don't want the seqid to run over 
> the one in 2
> 6. kill the regionserver hosting the region(s) of 'test'
> 7. restart the regionserver. Congratulations! the data writen in 4 is now all 
> lost
> *Why?*
> for case 1
> Since preserve splits in truncate table procedure will not change the 
> regioninfo, when log replay happens, the 'unflushed' data will restore back 
> to the region
> for case 2
> since the flushedSequenceIdByRegion are stored in Master in a map with the 
> region's encodedName. Although the table is truncated, the region's name is 
> not changed since we chose to preserve the splits. So after truncate the 
> table, the region's sequenceid is reset in the regionserver, but not reset in 
> master. When flush comes and report to master, master will reject the update 
> of sequenceid since the new one is smaller than the old one. The same happens 
> in log replay, all the edits writen in 4 will be skipped since they have a 
> smaller seqid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16649) Truncate table with splits preserved can cause both data loss and truncated data appeared again

2016-09-19 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-16649:

Attachment: HBASE-16649-v1.patch

> Truncate table with splits preserved can cause both data loss and truncated 
> data appeared again
> ---
>
> Key: HBASE-16649
> URL: https://issues.apache.org/jira/browse/HBASE-16649
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.3
>Reporter: Allan Yang
>Assignee: Matteo Bertozzi
> Attachments: HBASE-16649-v0.patch, HBASE-16649-v1.patch
>
>
> Since truncate table with splits preserved will delete hfiles and use the 
> previous regioninfo. It can cause odd behaviors
> - Case 1: *Data appeared after truncate*
> reproduce procedure:
> 1. create a table, let's say 'test'
> 2. write data to 'test', make sure memstore of 'test' is not empty
> 3. truncate 'test' with splits preserved
> 4. kill the regionserver hosting the region(s) of 'test'
> 5. start the regionserver, now it is the time to witness the miracle! the 
> truncated data appeared in table 'test'
> - Case 2: *Data loss*
> reproduce procedure:
> 1. create a table, let's say 'test'
> 2. write some data to 'test', no matter how many
> 3. truncate 'test' with splits preserved
> 4. restart the regionserver to reset the seqid
> 5. write some data, but less than 2 since we don't want the seqid to run over 
> the one in 2
> 6. kill the regionserver hosting the region(s) of 'test'
> 7. restart the regionserver. Congratulations! the data writen in 4 is now all 
> lost
> *Why?*
> for case 1
> Since preserve splits in truncate table procedure will not change the 
> regioninfo, when log replay happens, the 'unflushed' data will restore back 
> to the region
> for case 2
> since the flushedSequenceIdByRegion are stored in Master in a map with the 
> region's encodedName. Although the table is truncated, the region's name is 
> not changed since we chose to preserve the splits. So after truncate the 
> table, the region's sequenceid is reset in the regionserver, but not reset in 
> master. When flush comes and report to master, master will reject the update 
> of sequenceid since the new one is smaller than the old one. The same happens 
> in log replay, all the edits writen in 4 will be skipped since they have a 
> smaller seqid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16649) Truncate table with splits preserved can cause both data loss and truncated data appeared again

2016-09-19 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-16649:

Assignee: Matteo Bertozzi
  Status: Patch Available  (was: Open)

> Truncate table with splits preserved can cause both data loss and truncated 
> data appeared again
> ---
>
> Key: HBASE-16649
> URL: https://issues.apache.org/jira/browse/HBASE-16649
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.3
>Reporter: Allan Yang
>Assignee: Matteo Bertozzi
> Attachments: HBASE-16649-v0.patch
>
>
> Since truncate table with splits preserved will delete hfiles and use the 
> previous regioninfo. It can cause odd behaviors
> - Case 1: *Data appeared after truncate*
> reproduce procedure:
> 1. create a table, let's say 'test'
> 2. write data to 'test', make sure memstore of 'test' is not empty
> 3. truncate 'test' with splits preserved
> 4. kill the regionserver hosting the region(s) of 'test'
> 5. start the regionserver, now it is the time to witness the miracle! the 
> truncated data appeared in table 'test'
> - Case 2: *Data loss*
> reproduce procedure:
> 1. create a table, let's say 'test'
> 2. write some data to 'test', no matter how many
> 3. truncate 'test' with splits preserved
> 4. restart the regionserver to reset the seqid
> 5. write some data, but less than 2 since we don't want the seqid to run over 
> the one in 2
> 6. kill the regionserver hosting the region(s) of 'test'
> 7. restart the regionserver. Congratulations! the data writen in 4 is now all 
> lost
> *Why?*
> for case 1
> Since preserve splits in truncate table procedure will not change the 
> regioninfo, when log replay happens, the 'unflushed' data will restore back 
> to the region
> for case 2
> since the flushedSequenceIdByRegion are stored in Master in a map with the 
> region's encodedName. Although the table is truncated, the region's name is 
> not changed since we chose to preserve the splits. So after truncate the 
> table, the region's sequenceid is reset in the regionserver, but not reset in 
> master. When flush comes and report to master, master will reject the update 
> of sequenceid since the new one is smaller than the old one. The same happens 
> in log replay, all the edits writen in 4 will be skipped since they have a 
> smaller seqid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16649) Truncate table with splits preserved can cause both data loss and truncated data appeared again

2016-09-18 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-16649:

Attachment: HBASE-16649-v0.patch

not sure why we have always ignored the server manager containing region info, 
but that will also be a good thing to throw away on region remove. not just as 
fix for this. I think creating new name for truncate will result easier for 
debug.

attached a v0 draft that does both

> Truncate table with splits preserved can cause both data loss and truncated 
> data appeared again
> ---
>
> Key: HBASE-16649
> URL: https://issues.apache.org/jira/browse/HBASE-16649
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.3
>Reporter: Allan Yang
> Attachments: HBASE-16649-v0.patch
>
>
> Since truncate table with splits preserved will delete hfiles and use the 
> previous regioninfo. It can cause odd behaviors
> - Case 1: *Data appeared after truncate*
> reproduce procedure:
> 1. create a table, let's say 'test'
> 2. write data to 'test', make sure memstore of 'test' is not empty
> 3. truncate 'test' with splits preserved
> 4. kill the regionserver hosting the region(s) of 'test'
> 5. start the regionserver, now it is the time to witness the miracle! the 
> truncated data appeared in table 'test'
> - Case 2: *Data loss*
> reproduce procedure:
> 1. create a table, let's say 'test'
> 2. write some data to 'test', no matter how many
> 3. truncate 'test' with splits preserved
> 4. restart the regionserver to reset the seqid
> 5. write some data, but less than 2 since we don't want the seqid to run over 
> the one in 2
> 6. kill the regionserver hosting the region(s) of 'test'
> 7. restart the regionserver. Congratulations! the data writen in 4 is now all 
> lost
> *Why?*
> for case 1
> Since preserve splits in truncate table procedure will not change the 
> regioninfo, when log replay happens, the 'unflushed' data will restore back 
> to the region
> for case 2
> since the flushedSequenceIdByRegion are stored in Master in a map with the 
> region's encodedName. Although the table is truncated, the region's name is 
> not changed since we chose to preserve the splits. So after truncate the 
> table, the region's sequenceid is reset in the regionserver, but not reset in 
> master. When flush comes and report to master, master will reject the update 
> of sequenceid since the new one is smaller than the old one. The same happens 
> in log replay, all the edits writen in 4 will be skipped since they have a 
> smaller seqid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16649) Truncate table with splits preserved can cause both data loss and truncated data appeared again

2016-09-18 Thread Allan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-16649:
---
Description: 
Since truncate table with splits preserved will delete hfiles and use the 
previous regioninfo. It can cause odd behaviors

- Case 1: *Data appeared after truncate*
reproduce procedure:
1. create a table, let's say 'test'
2. write data to 'test', make sure memstore of 'test' is not empty
3. truncate 'test' with splits preserved
4. kill the regionserver hosting the region(s) of 'test'
5. start the regionserver, now it is the time to witness the miracle! the 
truncated data appeared in table 'test'

- Case 2: *Data loss*
reproduce procedure:
1. create a table, let's say 'test'
2. write some data to 'test', no matter how many
3. truncate 'test' with splits preserved
4. restart the regionserver to reset the seqid
5. write some data, but less than 2 since we don't want the seqid to run over 
the one in 2
6. kill the regionserver hosting the region(s) of 'test'
7. restart the regionserver. Congratulations! the data writen in 4 is now all 
lost

*Why?*
for case 1
Since preserve splits in truncate table procedure will not change the 
regioninfo, when log replay happens, the 'unflushed' data will restore back to 
the region
for case 2
since the flushedSequenceIdByRegion are stored in Master in a map with the 
region's encodedName. Although the table is truncated, the region's name is not 
changed since we chose to preserve the splits. So after truncate the table, the 
region's sequenceid is reset in the regionserver, but not reset in master. When 
flush comes and report to master, master will reject the update of sequenceid 
since the new one is smaller than the old one. The same happens in log replay, 
all the edits writen in 4 will be skipped since they have a smaller seqid

  was:
Since truncate table with splits preserved will delete hfiles and use the 
previous regioninfo. It can cause odd behaviors

- Case 1: *Data appeared after truncate*
reproduce procedure:
1. create a table, let's say 'test'
2. write data to 'test', make sure memstore of 'test' is not empty
3. truncate 'test' with splits preserved
4. kill the regionserver hosting the region(s) of 'test'
5. start the regionserver, now it is the time to witness the miracle! the 
truncated data appeared in table 'test'

- Case 2: *Data loss*
reproduce procedure:
1. create a table, let's say 'test'
2. write some data to 'test', no matter how many
3. truncate 'test' with splits preserved
4. write some data, but less than 2 since we don't want the seqid to run over 
the one in 2
5. kill the regionserver hosting the region(s) of 'test'
6. restart the regionserver. Congratulations! the data writen in 4 is now all 
lost

*Why?*
for case 1
Since preserve splits in truncate table procedure will not change the 
regioninfo, when log replay happens, the 'unflushed' data will restore back to 
the region
for case 2
since the flushedSequenceIdByRegion are stored in Master in a map with the 
region's encodedName. Although the table is truncated, the region's name is not 
changed since we chose to preserve the splits. So after truncate the table, the 
region's sequenceid is reset in the regionserver, but not reset in master. When 
flush comes and report to master, master will reject the update of sequenceid 
since the new one is smaller than the old one. The same happens in log replay, 
all the edits writen in 4 will be skipped since they have a smaller seqid


> Truncate table with splits preserved can cause both data loss and truncated 
> data appeared again
> ---
>
> Key: HBASE-16649
> URL: https://issues.apache.org/jira/browse/HBASE-16649
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.3
>Reporter: Allan Yang
>
> Since truncate table with splits preserved will delete hfiles and use the 
> previous regioninfo. It can cause odd behaviors
> - Case 1: *Data appeared after truncate*
> reproduce procedure:
> 1. create a table, let's say 'test'
> 2. write data to 'test', make sure memstore of 'test' is not empty
> 3. truncate 'test' with splits preserved
> 4. kill the regionserver hosting the region(s) of 'test'
> 5. start the regionserver, now it is the time to witness the miracle! the 
> truncated data appeared in table 'test'
> - Case 2: *Data loss*
> reproduce procedure:
> 1. create a table, let's say 'test'
> 2. write some data to 'test', no matter how many
> 3. truncate 'test' with splits preserved
> 4. restart the regionserver to reset the seqid
> 5. write some data, but less than 2 since we don't want the seqid to run over 
> the one in 2
> 6. kill the regionserver hosting the region(s) of 'test'
> 7. restart the regionserver. Congratulations! the data writen in 4 is now all 
> lost
> *Why?*
> 

[jira] [Updated] (HBASE-16649) Truncate table with splits preserved can cause both data loss and truncated data appeared again

2016-09-18 Thread Allan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-16649:
---
Description: 
Since truncate table with splits preserved will delete hfiles and use the 
previous regioninfo. It can cause odd behaviors

- Case 1: *Data appeared after truncate*
reproduce procedure:
1. create a table, let's say 'test'
2. write data to 'test', make sure memstore of 'test' is not empty
3. truncate 'test' with splits preserved
4. kill the regionserver hosting the region(s) of 'test'
5. start the regionserver, now it is the time to witness the miracle! the 
truncated data appeared in table 'test'

- Case 2: *Data loss*
reproduce procedure:
1. create a table, let's say 'test'
2. write some data to 'test', no matter how many
3. truncate 'test' with splits preserved
4. write some data, but less than 2 since we don't want the seqid to run over 
the one in 2
5. kill the regionserver hosting the region(s) of 'test'
6. restart the regionserver. Congratulations! the data writen in 4 is now all 
lost

*Why?*
for case 1
Since preserve splits in truncate table procedure will not change the 
regioninfo, when log replay happens, the 'unflushed' data will restore back to 
the region
for case 2
since the flushedSequenceIdByRegion are stored in Master in a map with the 
region's encodedName. Although the table is truncated, the region's name is not 
changed since we chose to preserve the splits. So after truncate the table, the 
region's sequenceid is reset in the regionserver, but not reset in master. When 
flush comes and report to master, master will reject the update of sequenceid 
since the new one is smaller than the old one. The same happens in log replay, 
all the edits writen in 4 will be skipped since they have a smaller seqid

  was:
Since truncate table with splits preserved will delete hfiles and use the 
previous regioninfo. It can cause odd behaviors

- Case 1: *Data appeared after truncate*
reproduce procedure:
1. create a table, let's say 'test'
2. write data to 'test', make sure memstore of 'test' is not empty
3. truncate 'test' with splits preserved
4. kill the regionserver hosting the region(s) of 'test'
5. start the regionserver, now it is the time to witness the miracle! the 
truncated data appeared in table 'test'

- Case 2: *Data loss*
reproduce procedure:
1. create a table, let's say 'test'
2. write some data to 'test', no matter how many
3. truncate 'test' with splits preserved
4. write some data, but less than 2 since we don't want the seqid to run over 
the one in 2
5. kill the regionserver hosting the region(s) of 'test'
6. restart the regionserver. Congratulations! the data writen in 4 is now all 
lost

*Why?*
for case 1
Since preserve splits in truncate table procedure will not change the 
regioninfo, when log replay happens, the 'unflushed' data will restore back to 
the region
for case 2
since the flushedSequenceIdByRegion are stored in Master in a map with the 
region's encodedName. Although the table is truncated, but the region's name is 
not changed since we chose to preserve the splits. So after truncate the table, 
the region's sequenceid is reset in the regionserver, but not reset in master. 
When flush comes and report to master, master will reject the update of 
sequenceid since the new one is smaller than the old one. So in log replay, all 
the edits writen in 4 will be skipped since they have a smaller seqid


> Truncate table with splits preserved can cause both data loss and truncated 
> data appeared again
> ---
>
> Key: HBASE-16649
> URL: https://issues.apache.org/jira/browse/HBASE-16649
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.3
>Reporter: Allan Yang
>
> Since truncate table with splits preserved will delete hfiles and use the 
> previous regioninfo. It can cause odd behaviors
> - Case 1: *Data appeared after truncate*
> reproduce procedure:
> 1. create a table, let's say 'test'
> 2. write data to 'test', make sure memstore of 'test' is not empty
> 3. truncate 'test' with splits preserved
> 4. kill the regionserver hosting the region(s) of 'test'
> 5. start the regionserver, now it is the time to witness the miracle! the 
> truncated data appeared in table 'test'
> - Case 2: *Data loss*
> reproduce procedure:
> 1. create a table, let's say 'test'
> 2. write some data to 'test', no matter how many
> 3. truncate 'test' with splits preserved
> 4. write some data, but less than 2 since we don't want the seqid to run over 
> the one in 2
> 5. kill the regionserver hosting the region(s) of 'test'
> 6. restart the regionserver. Congratulations! the data writen in 4 is now all 
> lost
> *Why?*
> for case 1
> Since preserve splits in truncate table procedure will not change the 
> regioninfo, when