Re: Help with recover data

2012-06-04 Thread Stefan Behrens
What have you done? Why do you need to recover data? What happened? A
power failure? A kernel crash?

On Tue, 29 May 2012 18:14:53 -0400, Maxim Mikheev wrote:
 I recently decided to use btrfs. It works perfectly for a week even
 under heavy load. Yesterday I destroyed backups as cannot afford to have
 ~10TB in backups. I decided to switch on Btrfs because it was announced
 that it stable already
 I need to recover ~5TB data, this data is important and I do not have
 backups
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with recover data

2012-06-04 Thread Maxim Mikheev

It was a kernel panic from btrfs.
I had around 40 parallel processes of reading/writing.

On 06/04/2012 08:24 AM, Stefan Behrens wrote:

What have you done? Why do you need to recover data? What happened? A
power failure? A kernel crash?

On Tue, 29 May 2012 18:14:53 -0400, Maxim Mikheev wrote:

I recently decided to use btrfs. It works perfectly for a week even
under heavy load. Yesterday I destroyed backups as cannot afford to have
~10TB in backups. I decided to switch on Btrfs because it was announced
that it stable already
I need to recover ~5TB data, this data is important and I do not have
backups

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with recover data

2012-06-04 Thread Maxim Mikheev
By the way, If data will be recovered I can easily reproduce crash 
situation. So it can be real-life heavy load test


On 06/04/2012 08:24 AM, Stefan Behrens wrote:

What have you done? Why do you need to recover data? What happened? A
power failure? A kernel crash?

On Tue, 29 May 2012 18:14:53 -0400, Maxim Mikheev wrote:

I recently decided to use btrfs. It works perfectly for a week even
under heavy load. Yesterday I destroyed backups as cannot afford to have
~10TB in backups. I decided to switch on Btrfs because it was announced
that it stable already
I need to recover ~5TB data, this data is important and I do not have
backups

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with recover data

2012-06-04 Thread Stefan Behrens
On Mon, 04 Jun 2012 08:26:43 -0400, Maxim Mikheev wrote:
 It was a kernel panic from btrfs.
 I had around 40 parallel processes of reading/writing.

Do you have a stack trace for this kernel panic, something with the term
BUG, WARNING and/or Call Trace in /var/log/kern.log or
/var/log/syslog (or in the old /var/log/syslog.?.gz /var/log/kern.log.?.gz)?

And are the disks connected via USB or how?

Is there an MD, LVM or encryption layer below btrfs in your setup?

Was the filesystem almost full?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with recover data

2012-06-04 Thread Maxim Mikheev
After looking on Kernel.log, looks like I had raid card failure and data 
was not stored properly on one of disks (/dev/sde).
Btrfs didn't recognized disk failure and keep trying to write data until 
reboot.


Some other tests after reboot shows that /dev/sde has generation 9095 
and other 4 disks have 9096.


This case shows that btrfs does not recognize and does not handle such 
errors. It probably need to be included as RAID cards can fail


But more important, that btrfs cannot recover automatically when one 
disk lost some data.


The next question how to roll back to generation 9095?

On 06/04/2012 10:08 AM, Maxim Mikheev wrote:

Disks were connected to RocketRaid 2760 directly as JBOD.

There is no LVM, MD or encryption. I used plain disks directly.

The file system was 55% full (1.7TB from 3TB for each disk).

Logs are attached.
The error happens at May 29, 13:55.

Log contain errors on May 27 for ZFS, It is why I decided to switch to 
btrfs. On the moment of failure, no ZFS was installed in the system.



On 06/04/2012 09:03 AM, Stefan Behrens wrote:

On Mon, 04 Jun 2012 08:26:43 -0400, Maxim Mikheev wrote:

It was a kernel panic from btrfs.
I had around 40 parallel processes of reading/writing.

Do you have a stack trace for this kernel panic, something with the term
BUG, WARNING and/or Call Trace in /var/log/kern.log or
/var/log/syslog (or in the old /var/log/syslog.?.gz 
/var/log/kern.log.?.gz)?


And are the disks connected via USB or how?

Is there an MD, LVM or encryption layer below btrfs in your setup?

Was the filesystem almost full?

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with recover data

2012-06-04 Thread Stefan Behrens
On Mon, 04 Jun 2012 10:08:54 -0400, Maxim Mikheev wrote:
 Disks were connected to RocketRaid 2760 directly as JBOD.
 
 There is no LVM, MD or encryption. I used plain disks directly.
 
 The file system was 55% full (1.7TB from 3TB for each disk).
 
 Logs are attached.
 The error happens at May 29, 13:55.
 
 Log contain errors on May 27 for ZFS, It is why I decided to switch to
 btrfs. On the moment of failure, no ZFS was installed in the system.

According to the kern.1.log file that you have sent (which is not
visible on the mailing list because it exceeded the 100,000 chars limit
of vger.kernel.org), a rebalance operation was active when the disks or
the RAID controller started to cause IO errors.

There seems to be a bug! Like that a write failure is ignored in btrfs.
For instance, the result of barrier_all_devices() is ignored. Afterwards
the superblocks are written referencing trees which have not been
completely written to disk.


...
May 29 13:08:07 s0 kernel: [46017.194519] btrfs: relocating block group
7236780818432 flags 9
May 29 13:08:36 s0 kernel: [46046.149492] btrfs: found 18543 extents
May 29 13:09:03 s0 kernel: [46072.944773] btrfs: found 18543 extents
May 29 13:09:04 s0 kernel: [46074.317760] btrfs: relocating block group
7235707076608 flags 20
...
May 29 13:55:56 s0 kernel: [48882.551881]
/home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1858:port 6 slot 1
rx_desc 30001 has error info80008000.
May 29 13:55:56 s0 kernel: [48882.551918]
/home/apw/COD/linux/drivers/scsi/mvsas/mv_94xx.c 626:command active
FCFD,  slot [1].
May 29 13:55:56 s0 kernel: [48882.552084] btrfs csum failed ino 62276
off 1019039744 csum 1546305812 private 3211821089
May 29 13:55:56 s0 kernel: [48882.552241] btrfs csum failed ino 62276
off 1018056704 csum 3750159096 private 3390793248
...
May 29 13:55:56 s0 kernel: [48882.553791] btrfs csum failed ino 62276
off 1018712064 csum 872056089 private 2640477920
May 29 13:55:56 s0 kernel: [48882.554528]
/home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1858:port 6 slot 1
rx_desc 30001 has error info0001.
May 29 13:55:56 s0 kernel: [48882.554541]
/home/apw/COD/linux/drivers/scsi/mvsas/mv_94xx.c 626:command active
FF3FFEFD,  slot [1].
May 29 13:55:56 s0 kernel: [48882.555626]
/home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1858:port 6 slot 22
rx_desc 30016 has error info0100.
May 29 13:55:56 s0 kernel: [48882.555635]
/home/apw/COD/linux/drivers/scsi/mvsas/mv_94xx.c 626:command active
FF3FFEFB,  slot [16].
May 29 13:55:56 s0 kernel: [48882.555659] sd 8:0:3:0: [sde] command
880006c57800 timed out
May 29 13:56:00 s0 kernel: [48886.313989] sd 8:0:3:0: [sde] command
88117af65700 timed out
...
May 29 13:56:00 s0 kernel: [48886.314186] sas: Enter
sas_scsi_recover_host busy: 31 failed: 31
May 29 13:56:00 s0 kernel: [48886.314204] sas: trying to find task
0x881083807640
May 29 13:56:00 s0 kernel: [48886.314210] sas: sas_scsi_find_task:
aborting task 0x881083807640
May 29 13:56:00 s0 kernel: [48886.314220]
/home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1632:mvs_abort_task()
mvi=8837faa8 task=881083807640 slot=8837faaa5140 slot_idx=x3
May 29 13:56:00 s0 kernel: [48886.314231] sas: sas_scsi_find_task: task
0x881083807640 is aborted
May 29 13:56:00 s0 kernel: [48886.314236] sas: sas_eh_handle_sas_errors:
task 0x881083807640 is aborted
...
May 29 13:56:00 s0 kernel: [48886.315030] sas: ata10: end_device-8:3:
cmd error handler
May 29 13:56:00 s0 kernel: [48886.315108] sas: ata7: end_device-8:0: dev
error handler
May 29 13:56:00 s0 kernel: [48886.315138] sas: ata8: end_device-8:1: dev
error handler
May 29 13:56:00 s0 kernel: [48886.315168] sas: ata9: end_device-8:2: dev
error handler
May 29 13:56:00 s0 kernel: [48886.315193] sas: ata10: end_device-8:3:
dev error handler
May 29 13:56:00 s0 kernel: [48886.315219] ata10.00: exception Emask 0x1
SAct 0x7fff SErr 0x0 action 0x6 frozen
May 29 13:56:00 s0 kernel: [48886.315239] ata10.00: failed command:
WRITE FPDMA QUEUED
May 29 13:56:00 s0 kernel: [48886.315255] ata10.00: cmd
61/08:00:88:a0:98/00:00:7c:00:00/40 tag 0 ncq 4096 out
May 29 13:56:00 s0 kernel: [48886.315258]  res
41/54:08:68:d6:98/00:00:7c:00:00/40 Emask 0x8d (timeout)
May 29 13:56:00 s0 kernel: [48886.315278] ata10.00: status: { DRDY ERR }
May 29 13:56:00 s0 kernel: [48886.315286] ata10.00: error: { UNC IDNF ABRT }
...
May 29 13:56:54 s0 kernel: [48940.752647] btrfs: run_one_delayed_ref
returned -5
May 29 13:56:54 s0 kernel: [48940.752652] btrfs: run_one_delayed_ref
returned -5
May 29 13:56:54 s0 kernel: [48940.752656]  99 28
May 29 13:56:54 s0 kernel: [48940.752665] [ cut here
]
May 29 13:56:54 s0 kernel: [48940.752669] [ cut here
]
May 29 13:56:54 s0 kernel: [48940.752674]  c2 00
May 29 13:56:54 s0 kernel: [48940.752683] [ cut here
]
May 29 13:56:54 s0 kernel: [48940.752747] WARNING: at
/home/apw/COD/linux/fs/btrfs/super.c:219

Re: Help with recover data

2012-06-04 Thread Maxim Mikheev

Can I roll back to 9095, as all disks has 9095?
How can I send this file to the mailing list?


On 06/04/2012 11:02 AM, Stefan Behrens wrote:

On Mon, 04 Jun 2012 10:08:54 -0400, Maxim Mikheev wrote:

Disks were connected to RocketRaid 2760 directly as JBOD.

There is no LVM, MD or encryption. I used plain disks directly.

The file system was 55% full (1.7TB from 3TB for each disk).

Logs are attached.
The error happens at May 29, 13:55.

Log contain errors on May 27 for ZFS, It is why I decided to switch to
btrfs. On the moment of failure, no ZFS was installed in the system.

According to the kern.1.log file that you have sent (which is not
visible on the mailing list because it exceeded the 100,000 chars limit
of vger.kernel.org), a rebalance operation was active when the disks or
the RAID controller started to cause IO errors.

There seems to be a bug! Like that a write failure is ignored in btrfs.
For instance, the result of barrier_all_devices() is ignored. Afterwards
the superblocks are written referencing trees which have not been
completely written to disk.


...
May 29 13:08:07 s0 kernel: [46017.194519] btrfs: relocating block group
7236780818432 flags 9
May 29 13:08:36 s0 kernel: [46046.149492] btrfs: found 18543 extents
May 29 13:09:03 s0 kernel: [46072.944773] btrfs: found 18543 extents
May 29 13:09:04 s0 kernel: [46074.317760] btrfs: relocating block group
7235707076608 flags 20
...
May 29 13:55:56 s0 kernel: [48882.551881]
/home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1858:port 6 slot 1
rx_desc 30001 has error info80008000.
May 29 13:55:56 s0 kernel: [48882.551918]
/home/apw/COD/linux/drivers/scsi/mvsas/mv_94xx.c 626:command active
FCFD,  slot [1].
May 29 13:55:56 s0 kernel: [48882.552084] btrfs csum failed ino 62276
off 1019039744 csum 1546305812 private 3211821089
May 29 13:55:56 s0 kernel: [48882.552241] btrfs csum failed ino 62276
off 1018056704 csum 3750159096 private 3390793248
...
May 29 13:55:56 s0 kernel: [48882.553791] btrfs csum failed ino 62276
off 1018712064 csum 872056089 private 2640477920
May 29 13:55:56 s0 kernel: [48882.554528]
/home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1858:port 6 slot 1
rx_desc 30001 has error info0001.
May 29 13:55:56 s0 kernel: [48882.554541]
/home/apw/COD/linux/drivers/scsi/mvsas/mv_94xx.c 626:command active
FF3FFEFD,  slot [1].
May 29 13:55:56 s0 kernel: [48882.555626]
/home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1858:port 6 slot 22
rx_desc 30016 has error info0100.
May 29 13:55:56 s0 kernel: [48882.555635]
/home/apw/COD/linux/drivers/scsi/mvsas/mv_94xx.c 626:command active
FF3FFEFB,  slot [16].
May 29 13:55:56 s0 kernel: [48882.555659] sd 8:0:3:0: [sde] command
880006c57800 timed out
May 29 13:56:00 s0 kernel: [48886.313989] sd 8:0:3:0: [sde] command
88117af65700 timed out
...
May 29 13:56:00 s0 kernel: [48886.314186] sas: Enter
sas_scsi_recover_host busy: 31 failed: 31
May 29 13:56:00 s0 kernel: [48886.314204] sas: trying to find task
0x881083807640
May 29 13:56:00 s0 kernel: [48886.314210] sas: sas_scsi_find_task:
aborting task 0x881083807640
May 29 13:56:00 s0 kernel: [48886.314220]
/home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1632:mvs_abort_task()
mvi=8837faa8 task=881083807640 slot=8837faaa5140 slot_idx=x3
May 29 13:56:00 s0 kernel: [48886.314231] sas: sas_scsi_find_task: task
0x881083807640 is aborted
May 29 13:56:00 s0 kernel: [48886.314236] sas: sas_eh_handle_sas_errors:
task 0x881083807640 is aborted
...
May 29 13:56:00 s0 kernel: [48886.315030] sas: ata10: end_device-8:3:
cmd error handler
May 29 13:56:00 s0 kernel: [48886.315108] sas: ata7: end_device-8:0: dev
error handler
May 29 13:56:00 s0 kernel: [48886.315138] sas: ata8: end_device-8:1: dev
error handler
May 29 13:56:00 s0 kernel: [48886.315168] sas: ata9: end_device-8:2: dev
error handler
May 29 13:56:00 s0 kernel: [48886.315193] sas: ata10: end_device-8:3:
dev error handler
May 29 13:56:00 s0 kernel: [48886.315219] ata10.00: exception Emask 0x1
SAct 0x7fff SErr 0x0 action 0x6 frozen
May 29 13:56:00 s0 kernel: [48886.315239] ata10.00: failed command:
WRITE FPDMA QUEUED
May 29 13:56:00 s0 kernel: [48886.315255] ata10.00: cmd
61/08:00:88:a0:98/00:00:7c:00:00/40 tag 0 ncq 4096 out
May 29 13:56:00 s0 kernel: [48886.315258]  res
41/54:08:68:d6:98/00:00:7c:00:00/40 Emask 0x8d (timeout)
May 29 13:56:00 s0 kernel: [48886.315278] ata10.00: status: { DRDY ERR }
May 29 13:56:00 s0 kernel: [48886.315286] ata10.00: error: { UNC IDNF ABRT }
...
May 29 13:56:54 s0 kernel: [48940.752647] btrfs: run_one_delayed_ref
returned -5
May 29 13:56:54 s0 kernel: [48940.752652] btrfs: run_one_delayed_ref
returned -5
May 29 13:56:54 s0 kernel: [48940.752656]  99 28
May 29 13:56:54 s0 kernel: [48940.752665] [ cut here
]
May 29 13:56:54 s0 kernel: [48940.752669] [ cut here
]
May 29 13:56:54 s0 kernel: [48940.752674]  c2 00
May 29 13:56:54 s0 kernel: [48940.752683] 

Re: Help with recover data

2012-06-04 Thread Stefan Behrens
On Mon, 04 Jun 2012 11:08:36 -0400, Maxim Mikheev wrote:
 How can I send this file to the mailing list?

Using web space, e.g. http://pastebin.com/
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with recover data

2012-06-04 Thread Maxim Mikheev

pastebin.com has limit 500K

I put file here: http://www.4shared.com/archive/I8cU3K43/kernlog1.html?

On 06/04/2012 11:11 AM, Stefan Behrens wrote:

On Mon, 04 Jun 2012 11:08:36 -0400, Maxim Mikheev wrote:

How can I send this file to the mailing list?

Using web space, e.g. http://pastebin.com/

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with recover data

2012-06-04 Thread Maxim Mikheev

Is any chance to fix it and recover data after such failure?

On 06/04/2012 11:02 AM, Stefan Behrens wrote:

On Mon, 04 Jun 2012 10:08:54 -0400, Maxim Mikheev wrote:

Disks were connected to RocketRaid 2760 directly as JBOD.

There is no LVM, MD or encryption. I used plain disks directly.

The file system was 55% full (1.7TB from 3TB for each disk).

Logs are attached.
The error happens at May 29, 13:55.

Log contain errors on May 27 for ZFS, It is why I decided to switch to
btrfs. On the moment of failure, no ZFS was installed in the system.

According to the kern.1.log file that you have sent (which is not
visible on the mailing list because it exceeded the 100,000 chars limit
of vger.kernel.org), a rebalance operation was active when the disks or
the RAID controller started to cause IO errors.

There seems to be a bug! Like that a write failure is ignored in btrfs.
For instance, the result of barrier_all_devices() is ignored. Afterwards
the superblocks are written referencing trees which have not been
completely written to disk.


...
May 29 13:08:07 s0 kernel: [46017.194519] btrfs: relocating block group
7236780818432 flags 9
May 29 13:08:36 s0 kernel: [46046.149492] btrfs: found 18543 extents
May 29 13:09:03 s0 kernel: [46072.944773] btrfs: found 18543 extents
May 29 13:09:04 s0 kernel: [46074.317760] btrfs: relocating block group
7235707076608 flags 20
...
May 29 13:55:56 s0 kernel: [48882.551881]
/home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1858:port 6 slot 1
rx_desc 30001 has error info80008000.
May 29 13:55:56 s0 kernel: [48882.551918]
/home/apw/COD/linux/drivers/scsi/mvsas/mv_94xx.c 626:command active
FCFD,  slot [1].
May 29 13:55:56 s0 kernel: [48882.552084] btrfs csum failed ino 62276
off 1019039744 csum 1546305812 private 3211821089
May 29 13:55:56 s0 kernel: [48882.552241] btrfs csum failed ino 62276
off 1018056704 csum 3750159096 private 3390793248
...
May 29 13:55:56 s0 kernel: [48882.553791] btrfs csum failed ino 62276
off 1018712064 csum 872056089 private 2640477920
May 29 13:55:56 s0 kernel: [48882.554528]
/home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1858:port 6 slot 1
rx_desc 30001 has error info0001.
May 29 13:55:56 s0 kernel: [48882.554541]
/home/apw/COD/linux/drivers/scsi/mvsas/mv_94xx.c 626:command active
FF3FFEFD,  slot [1].
May 29 13:55:56 s0 kernel: [48882.555626]
/home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1858:port 6 slot 22
rx_desc 30016 has error info0100.
May 29 13:55:56 s0 kernel: [48882.555635]
/home/apw/COD/linux/drivers/scsi/mvsas/mv_94xx.c 626:command active
FF3FFEFB,  slot [16].
May 29 13:55:56 s0 kernel: [48882.555659] sd 8:0:3:0: [sde] command
880006c57800 timed out
May 29 13:56:00 s0 kernel: [48886.313989] sd 8:0:3:0: [sde] command
88117af65700 timed out
...
May 29 13:56:00 s0 kernel: [48886.314186] sas: Enter
sas_scsi_recover_host busy: 31 failed: 31
May 29 13:56:00 s0 kernel: [48886.314204] sas: trying to find task
0x881083807640
May 29 13:56:00 s0 kernel: [48886.314210] sas: sas_scsi_find_task:
aborting task 0x881083807640
May 29 13:56:00 s0 kernel: [48886.314220]
/home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1632:mvs_abort_task()
mvi=8837faa8 task=881083807640 slot=8837faaa5140 slot_idx=x3
May 29 13:56:00 s0 kernel: [48886.314231] sas: sas_scsi_find_task: task
0x881083807640 is aborted
May 29 13:56:00 s0 kernel: [48886.314236] sas: sas_eh_handle_sas_errors:
task 0x881083807640 is aborted
...
May 29 13:56:00 s0 kernel: [48886.315030] sas: ata10: end_device-8:3:
cmd error handler
May 29 13:56:00 s0 kernel: [48886.315108] sas: ata7: end_device-8:0: dev
error handler
May 29 13:56:00 s0 kernel: [48886.315138] sas: ata8: end_device-8:1: dev
error handler
May 29 13:56:00 s0 kernel: [48886.315168] sas: ata9: end_device-8:2: dev
error handler
May 29 13:56:00 s0 kernel: [48886.315193] sas: ata10: end_device-8:3:
dev error handler
May 29 13:56:00 s0 kernel: [48886.315219] ata10.00: exception Emask 0x1
SAct 0x7fff SErr 0x0 action 0x6 frozen
May 29 13:56:00 s0 kernel: [48886.315239] ata10.00: failed command:
WRITE FPDMA QUEUED
May 29 13:56:00 s0 kernel: [48886.315255] ata10.00: cmd
61/08:00:88:a0:98/00:00:7c:00:00/40 tag 0 ncq 4096 out
May 29 13:56:00 s0 kernel: [48886.315258]  res
41/54:08:68:d6:98/00:00:7c:00:00/40 Emask 0x8d (timeout)
May 29 13:56:00 s0 kernel: [48886.315278] ata10.00: status: { DRDY ERR }
May 29 13:56:00 s0 kernel: [48886.315286] ata10.00: error: { UNC IDNF ABRT }
...
May 29 13:56:54 s0 kernel: [48940.752647] btrfs: run_one_delayed_ref
returned -5
May 29 13:56:54 s0 kernel: [48940.752652] btrfs: run_one_delayed_ref
returned -5
May 29 13:56:54 s0 kernel: [48940.752656]  99 28
May 29 13:56:54 s0 kernel: [48940.752665] [ cut here
]
May 29 13:56:54 s0 kernel: [48940.752669] [ cut here
]
May 29 13:56:54 s0 kernel: [48940.752674]  c2 00
May 29 13:56:54 s0 kernel: [48940.752683] [ cut here

Re: Help with recover data

2012-06-04 Thread Ryan C. Underwood
On Mon, Jun 04, 2012 at 05:02:26PM +0200, Stefan Behrens wrote:
 
 According to the kern.1.log file that you have sent (which is not
 visible on the mailing list because it exceeded the 100,000 chars limit
 of vger.kernel.org), a rebalance operation was active when the disks or
 the RAID controller started to cause IO errors.
 
 There seems to be a bug! Like that a write failure is ignored in btrfs.
 For instance, the result of barrier_all_devices() is ignored. Afterwards
 the superblocks are written referencing trees which have not been
 completely written to disk.

This may be also what happened when my hardware RAID blew up.  I was
left with two completely inconsistent/unusable btrfs which I am still
attempting to recover.

Assuming that the general mount options to remount read-only on errors
are correctly handled by btrfs, that would seem to be the wise thing
to do.  IMO it seems a volume which experiences a metadata write error
to the underlying medium should be made immediately read-only anyway.

-- 
Ryan C. Underwood, neme...@icequake.net


signature.asc
Description: Digital signature


Re: Help with recover data

2012-05-29 Thread Felix Blanke



On 5/30/12 12:14 AM, Maxim Mikheev wrote:

Hi Everyone,

I recently decided to use btrfs. It works perfectly for a week even
under heavy load. Yesterday I destroyed backups as cannot afford to have
~10TB in backups. I decided to switch on Btrfs because it was announced
that it stable already
I need to recover ~5TB data, this data is important and I do not have
backups



Just out of curiosity: Who announced that BTRFS is stable already?! The 
kernel says something different and there is still no 100% working fsck 
for btrfs. Imho it is far away from being stable :)


And btw: Even it would be stable, allways keep backups for important 
data ffs! I don't understand why there are still technical experienced 
people who don't do backups :/ Imho if you don't do backups from a 
portion of data they are considered not to be important.




uname -a
Linux s0 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02 UTC
2012 x86_64 x86_64 x86_64 GNU/Linux

sudo mount -o recovery /dev/sdb /tank
mount: wrong fs type, bad option, bad superblock on /dev/sdb,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so

dmesg:
[ 9612.971149] device fsid c9776e19-37eb-4f9c-bd6b-04e8dde97682 devid 2
transid 9096 /dev/sdb
[ 9613.048476] btrfs: enabling auto recovery
[ 9613.048482] btrfs: disk space caching is enabled
[ 9621.172540] parent transid verify failed on 5468060241920 wanted 9096
found 7621
[ 9621.181369] parent transid verify failed on 5468060241920 wanted 9096
found 7621
[ 9621.182167] btrfs read error corrected: ino 1 off 5468060241920 (dev
/dev/sdd sector 2143292648)
[ 9621.182181] Failed to read block groups: -5
[ 9621.193680] btrfs: open_ctree failed

sudo /usr/local/bin/btrfs-find-root /dev/sdb
...
Well block 4455562448896 seems great, but generation doesn't match,
have=9092, want=9096
Well block 4455568302080 seems great, but generation doesn't match,
have=9091, want=9096
Well block 4848395739136 seems great, but generation doesn't match,
have=9093, want=9096
Well block 4923796594688 seems great, but generation doesn't match,
have=9094, want=9096
Well block 4923798065152 seems great, but generation doesn't match,
have=9095, want=9096
Found tree root at 5532762525696


$ sudo btrfs-restore -v -t 4923798065152 /dev/sdb ./
parent transid verify failed on 4923798065152 wanted 9096 found 9095
parent transid verify failed on 4923798065152 wanted 9096 found 9095
parent transid verify failed on 4923798065152 wanted 9096 found 9095
parent transid verify failed on 4923798065152 wanted 9096 found 9095
Ignoring transid failure
Root objectid is 5
Restoring ./Irina
Restoring ./Irina/.idmapdir2
Skipping existing file ./Irina/.idmapdir2/4.bucket.lock
If you wish to overwrite use the -o option to overwrite
Skipping existing file ./Irina/.idmapdir2/7.bucket
Skipping existing file ./Irina/.idmapdir2/15.bucket
Skipping existing file ./Irina/.idmapdir2/12.bucket.lock
Skipping existing file ./Irina/.idmapdir2/cap.txt
Skipping existing file ./Irina/.idmapdir2/5.bucket
Restoring ./Irina/.idmapdir2/10.bucket.lock
Restoring ./Irina/.idmapdir2/6.bucket.lock
Restoring ./Irina/.idmapdir2/8.bucket
ret is -3


sudo btrfs-zero-log /dev/sdb
...
parent transid verify failed on 5468231311360 wanted 9096 found 7621
parent transid verify failed on 5468231311360 wanted 9096 found 7621
parent transid verify failed on 5468060102656 wanted 9096 found 7621
Ignoring transid failure
leaf parent key incorrect 59310080
btrfs-zero-log: extent-tree.c:2578: alloc_reserved_tree_block: Assertion
`!(ret)' failed.

Help me please.

Max
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with recover data

2012-05-29 Thread cwillu
On Tue, May 29, 2012 at 5:14 PM, Felix Blanke felixbla...@gmail.com wrote:


 On 5/30/12 12:14 AM, Maxim Mikheev wrote:

 Hi Everyone,

 I recently decided to use btrfs. It works perfectly for a week even
 under heavy load. Yesterday I destroyed backups as cannot afford to have
 ~10TB in backups. I decided to switch on Btrfs because it was announced
 that it stable already
 I need to recover ~5TB data, this data is important and I do not have
 backups


 Just out of curiosity: Who announced that BTRFS is stable already?! The
 kernel says something different and there is still no 100% working fsck for
 btrfs. Imho it is far away from being stable :)

 And btw: Even it would be stable, allways keep backups for important data
 ffs! I don't understand why there are still technical experienced people who
 don't do backups :/ Imho if you don't do backups from a portion of data they
 are considered not to be important.

Some distros do offer support, but that's usually in the sense of if
you have a support contract (and are on qualified hardware and using
it in a supported configuration), we'll help you fix what breaks (and
we're confident we can), rather than a claim that things will never
break.

I expect (but haven't actually checked recently) that such distros
actively backport btrfs fixes to their supported kernels (btrfs in
Distro X's 3.2 kernel may have fixes that Distro Y's 3.2 kernel does
not, etc), which can lead to unfortunate misunderstandings; we don't
have enough information yet to determine whether that's the case here
though.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html