Re: cancel btrfs delete job

2014-06-27 Thread Satoru Takeuchi

Hi Franziska,


(2014/06/27 14:00), Franziska Näpelt wrote:

Hi!

After about 12 hours of booting, the system runs now


Congratulations!


The fifth harddrive is still in the btrfs-pool.

Here is the log from the crash, while the btrfs delete job runs:

Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957248] [ cut
here ]
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957268] WARNING: CPU: 3 PID:
31131 at fs/btrfs/super.c:259 __btrfs_abort_transaction+0x46/0xf8 [
btrfs]()
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957270] Modules linked in:
xts gf128mul tun parport_pc ppdev lp parport bnep rfcomm bluetooth rf
kill pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) cpufreq_powersave
cpufreq_userspace cpufreq_stats cpufreq_conservative vboxdrv(O) binfm
t_misc fuse nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache
sunrpc ext2 dm_crypt hwmon_vid loop firewire_sbp2 snd_hda_codec_hdmi snd
_hda_intel joydev radeon ttm drm_kms_helper iTCO_wdt iTCO_vendor_support
snd_hda_controller drm i2c_algo_bit snd_hda_codec snd_hwdep snd_pcm
  i7core_edac snd_timer edac_core snd soundcore psmouse acpi_cpufreq
coretemp processor kvm_intel kvm microcode lpc_ich mfd_core pcspkr asus_
atk0110 ehci_pci mxm_wmi i2c_i801 i2c_core wmi serio_raw thermal_sys
evdev button ext4 crc16 jbd2 mbcache btrfs xor raid6_pq dm_mod raid1 md
_mod sg sd_mod crct10dif_generic crc_t10dif crct10dif_common hid_generic
usbhid hid crc32c_intel firewire_ohci r8169 firewire_core mii crc_i
tu_t sata_sil ahci libahci sata_mv uhci_hcd ehci_hcd
Jun 25 20:34:59 hsad-srv-03 kernel: libata xhci_hcd scsi_mod usbcore
usb_common
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957309] CPU: 3 PID: 31131
Comm: find Tainted: G   O  3.15.0 #1
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957310] Hardware name:
System manufacturer System Product Name/SABERTOOTH X58, BIOS 1304
08/02/2011
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957311]  
0009 8138b54a 880001593b58
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957313]  81039583
 a01c8123 00b0
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957315]  ffe4
880625fb9000 8801077e8e80 a0247ac0
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957317] Call Trace:
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957321]
[8138b54a] ? dump_stack+0x41/0x51
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957324]
[81039583] ? warn_slowpath_common+0x78/0x90
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957331]
[a01c8123] ? __btrfs_abort_transaction+0x46/0xf8 [btrfs]
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957333]
[81039633] ? warn_slowpath_fmt+0x45/0x4a
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957340]
[a01c8123] ? __btrfs_abort_transaction+0x46/0xf8 [btrfs]
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957348]
[a01d648a] ? __btrfs_free_extent+0x80a/0x84d [btrfs]
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957351]
[8138db1c] ? mutex_trylock+0x10/0x29
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957359]
[a01dabfe] ? __btrfs_run_delayed_refs+0xae4/0xc2b [btrfs]
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957368]
[a01dc86c] ? btrfs_run_delayed_refs+0x7b/0x17e [btrfs]
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957378]
[a01ea1d7] ? __btrfs_end_transaction+0xe5/0x2c0 [btrfs]
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957389]
[a01ee9bb] ? btrfs_dirty_inode+0x8c/0xa7 [btrfs]
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957391]
[8111f12d] ? touch_atime+0xe3/0x11c
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957393]
[81119843] ? iterate_dir+0x7c/0xa2
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957395]
[81119949] ? SyS_getdents+0x74/0xca
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957397]
[811196ee] ? filldir64+0xdd/0xdd
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957399]
[81394522] ? system_call_fastpath+0x16/0x1b
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957400] ---[ end trace
8392ac15dafb7de4 ]---
Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957422] BTRFS info (device
sdh): forced readonly


Your delete job seemed to fail at __btrfs_abort_transaction(),
resulted in readonly remount at that time.


In addition, if you encounter this kind of situation, setting
skip_balance mount option would help you. It skips to continue
balance at mount time. Please see also the following thread.

It's about the case which Marc tried to balance btrfs and
hanup happened.

http://comments.gmane.org/gmane.comp.file-systems.btrfs/35791

Thanks,
Satoru





After that event, there are furher entries in the messages log, but
nothing interesting, only some dhcp infos. Three minutes later, the log
stopped without any message.

Does someone need further logs?


I have no idea.

Thanks,
Satoru



best regards,
Franziska


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to 

Re: cancel btrfs delete job

2014-06-27 Thread Satoru Takeuchi

Hi Franziska,


(2014/06/26 20:34), Franziska Näpelt wrote:

Hi Satoru,

I'm sorry, but the boot process is always runnig(i hope so), i can't
login until now. So therefore i have currently no logs.
I don't want to interrupt these process, because there are a lot of
fileactions on the harddrive (LED is blinking).

I'm not sure about the mkfs.btrfs option, because the system was set up
more than one year ago.

mount-options in fstab:
LABEL=btrfs-pool /mnt/btrfs btrfs compress=lzo,degraded 0 1

kernel version is 3.15 on a Debian Whezzy with current btrfs-tools
installed

Can you estimate, how long the boot-process (repairing btrfs?) could
take? The boot-process works for five hours now.


To do so, I'll try to follow your steps with the system which similar
to your environment as possible. Unfortunately I don't have plenty
of disks.


Although you've already succeeded to mount your btrfs now,
I share how long does Franziska's operations take anyway.

Please note that I measured not balance after reset
during balance triggered by delete, but balance
triggered by delete. Since most required time of both work
are balance, the result of former would be similar
to the latter.



Environment:
- x86_64 fedora20 KVM guest on x86_64 fedora20 host
- RAM: 4GiB
- kernel: 3.15.2
- Storage: 50GB virt-io disk
  - small devices: /dev/vd[d-g]
  - large devices: /dev/vd[hi]

  # All of these virtual devices are backed by
  # files on a real HDD in the host.

Operations:
 1. Make a Btrfs filesystem
 2. Make a junk file in the filesystem.
 3. Add a large device
 4. Remove a small device and measure how long it takes.

Script:
===
#!/bin/bash

MOUNTPOINT=/home/sat/mnt

MEGABYTES=4096

mkfs.btrfs -f /dev/vdd /dev/vde /dev/vdf /dev/vdg
mount -o compress=lzo /dev/vdd /home/sat/mnt
dd if=/dev/urandom of=/home/sat/mnt/junk oflag=direct bs=1MiB count=$MEGABYTES
btrfs dev add -f /dev/vdh /home/sat/mnt
time btrfs dev del /dev/vdg /home/sat/mnt
umount /home/sat/mnt
===

Test factors:
  - Device size
 - small: 2GB, large: 3GB
 - small: 4GB, large: 6GB
  - The size of junk file # MEGABYTES parameter of script
- 1/2 GB
- 1 GB
- 2 GB
- 4 GB

Result (*1):

device size[GB]| junk file | | time/junk
--++ size [GB] | time[s] | file size
small | large  |   | |[s/GB]
==++===+=+===
2 |  3 |   1/2 | 5.3 |10.6
  || 1 | 9.6 | 9.6
  || 2 |19.0 | 9.5
--++---+-+---
4 |  6 |   1/2 | 5.1 |10.2
  || 1 | 9.4 | 9.4
  || 2 |17.0 | 8.5
  || 4 |39.3 | 9.8

*1) This data is the average of three tries.

So, it seems that how long delete (and balance)
takes is proportional to the used size (the size
of junk file here). In my case, delete work seems
to take about 10 [s/GB]. If the storage size
are 2TB for small devices and 3TB for large devices,
and junk file size is 2TB, this operation would take
5.4 hours.

Of course, it's a too simplified case and it wouldn't
apply to your case cleanly. However, this kind of
measurement would help to estimate the required time
to your next balance operation.

Thanks,
Satoru





best regards,
Franziska



Hi Franziska,

(2014/06/26 19:05), Franziska Näpelt wrote:

Hello Satoru,

here are your requested informations:

environment:

- four 2 TB disks: /dev/sd[c-f]
- two 3 TB disks: /dev/sdg (but until now, only one is connected)

filesystem consists of/dev/sd[c-f]

I wanted to replace /dev/sdc by /dev/sdg ( with commands add and after
that delete.
In the second step, I wanted to replace the next disk.

But it hanged during btrfs delete command (after successfull add).
The delete process was still in progress, but with iotop it seems to me, that 
there is was no data transfer.


Hm, them something bad would happen on Btrfs.


Today in the morning the hole computer hangs and there was no other possibility 
than reset :(


So, unfortunately any debug info like sysrq-w can't be get.



Until now, he tries to boot with a lot of erros. But I can see, that there are 
fileactions on the harddrive.

There are a lot of following messages:
btrfs free space inode generation (0) did not match free space cache generation


And doesn't finish to mount process?

Your filesystem is in inconsistent state since you
reset during rebalancing filesystem which triggered by
device deletion.

The following link would help you. But I'm not sure whether
your data can be restored or not.

https://btrfs.wiki.kernel.org/index.php/Btrfsck

Could you tell me your mkfs.btrfs options, mount options,
and kernel version, if possible? I'd like to try to
reproduce your problem anyway.

Thanks,
Satoru




best 

Re: Blocked tasks on 3.15.1

2014-06-27 Thread Tomasz Chmielewski

I've been getting blocked tasks on 3.15.1 generally at times when the
filesystem is somewhat busy (such as doing a backup via scp/clonezilla
writing to the disk).

A week ago I had enabled snapper for a day which resulted in a daily
cleanup of about 8 snapshots at once, which might have contributed,
but I've been limping along since.


I've started seeing similar on several servers, after upgrading to 3.15 
or 3.15.1. With 3.16-rc1 it was even crashing for me.

I've rolled back to the latest 3.14.x, and it's still behaving fine.

I've signalled it before on the list in btrfs filesystem hang with 
3.15-rc3 when doing rsync thread.


--
Tomasz Chmielewski
http://www.sslrack.com

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs-progs: Add uninstall targets to Makefiles.

2014-06-27 Thread David Sterba
On Wed, Jun 25, 2014 at 09:40:40PM +0200, Nils Steinger wrote:
 On Mon, Jun 23, 2014 at 05:04:42PM +0200, David Sterba wrote:
  On Mon, Jun 23, 2014 at 04:23:48AM +0200, Nils Steinger wrote:
   + rmdir -p --ignore-fail-on-non-empty $(DESTDIR)$(man8dir)
  
   + rmdir -p --ignore-fail-on-non-empty $(DESTDIR)$(libdir)
  
   + rmdir -p --ignore-fail-on-non-empty $(DESTDIR)$(bindir)
  
  I don't think it's right to remove the systemwide directories: bindir,
  libdir and man8dir. There rest are btrfs subdirs (eg. incdir), that's
  fine.
 
 On my system, man8dir didn't exist prior to the installation, so I
 thought it would be reasonable to have the uninstallation routine remove
 it.

According to the FHS [1] the manX directories do not have to exsit, so
this part shall stay.

 bindir and libdir will exist by default on most systems, so that's a
 different case…
 So, should we really keep the directories around, even if they were
 created by the installation and are now empty (if they aren't, they
 won't be removed anyway)?

But we don't track if the directories were created by the installation
or not.  Normally the directories would exist anyway (/usr or /usr/local
as prefix) and are expected to exist at the locations. Installation to
arbitraty directory works, but managing the directories is IMO up to the
user.

So are you ok with keeping bindir and libdir only (ie. removing only
man8dir)?

Thanks.

[1] http://www.pathname.com/fhs/pub/fhs-2.3.html#USRSHAREMANMANUALPAGES
and then note #32
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] btrfs-progs: limit minimal num of args for btrfs-image

2014-06-27 Thread David Sterba
On Thu, Jun 26, 2014 at 10:53:05AM +0800, Gui Hecheng wrote:
 @@ -2521,6 +2521,9 @@ int main(int argc, char *argv[])
   }
  
   argc = argc - optind;
 + if (argc  2)

Please use the check_argc_min helper instead. Thanks.

 + print_usage();
 +
   dev_cnt = argc - 1;
  
   if (create) {
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Blocked tasks on 3.15.1

2014-06-27 Thread Duncan
Tomasz Chmielewski posted on Fri, 27 Jun 2014 12:02:43 +0200 as excerpted:

 I've been getting blocked tasks on 3.15.1 generally at times when the
 filesystem is somewhat busy (such as doing a backup via scp/clonezilla
 writing to the disk).
 
 I've started seeing similar on several servers, after upgrading to 3.15
 or 3.15.1. With 3.16-rc1 it was even crashing for me.
 I've rolled back to the latest 3.14.x, and it's still behaving fine.
 
 I've signalled it before on the list in btrfs filesystem hang with
 3.15-rc3 when doing rsync thread.

There is a known btrfs lockup bug that was introduced in the commit-
window btrfs pull for 3.16, that was fixed by a pull I believe the day 
before 3.16-rc2.  So 3.16-pre to rc2 is known-bad tho it'll work for a 
few minutes and didn't do any permanent damage that I could see, here.

But from 3.16-rc2 on, the 3.16-pre series has been working fine for me.

For 3.15, I didn't run the pre-releases as I had another project I was 
focusing on, but I experienced no problems with 3.15 itself.  However, my 
use-case is multiple independent small btrfs on partitioned SSD, sub-100-
GB per btrfs, so I'd be less likely to experience the blocked task issues 
that others reported, mostly on TB+ size spinning rust.

And it /did/ seem to me that the frequency of blocked-task reports were 
higher for 3.15 than for previous kernel series, tho 3.15 worked fine for 
me on small btrfs on SSD, the relatively short time I ran it.

Hopefully that problem's fixed on 3.16-rc2+, but as of yet there's not 
enough 3.16-rc2+ reports out there from folks experiencing issues with 
3.15 blocked tasks to rightfully say.  What CAN be said is that the known 
3.16-series commit-window btrfs lockups bug that DID affect me was fixed 
right before rc2, and I'm running rc2+ just fine, here.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: add supported attr flags to btrfs(5)

2014-06-27 Thread David Sterba
On Thu, Jun 26, 2014 at 03:38:33PM -0500, Eric Sandeen wrote:
 +FILE ATTRIBUTES
 +---
 +The btrfs filesystem supports setting the following file
 +attributes the `chattr`(1) utility
 +append only (a), no atime updates (A), compressed (c), no copy on write (C),
 +no dump (d), synchronous directory updates (d), immutable (i),
 +synchronous updates (S), and no compression (X).

The formatting is not eye-pleasing.

I've spotted a few mistakes:

* 'd' is listed twice, for sync directory updates it's 'D'

* and 'X' does not mean no compression and never has, although I'd
  like to see a chattr bit for that because we have the corresponding
  inode bit

I've checked your patches, the meaning of 'X' hasn't changed.

I took the opportunity and reformated the options:

@@ -183,9 +183,24 @@ FILE ATTRIBUTES
 ---
 The btrfs filesystem supports setting the following file
 attributes the `chattr`(1) utility
-append only (a), no atime updates (A), compressed (c), no copy on write (C),
-no dump (d), synchronous directory updates (d), immutable (i),
-synchronous updates (S), and no compression (X).
+
+*a* -- append only
+
+*A* -- no atime updates
+
+*c* -- compressed
+
+*C* -- no copy on write
+
+*d* -- no dump
+
+*D* -- synchronous directory updates
+
+*i* -- immutable
+
+*S* -- synchronous updates

 For descriptions of these attribute flags, please refer to the
 `chattr`(1) man page.
---

looks almost the same in the manpage and gives IMO a good
overview. For initial patch I'm ok with the descriptions, we can enhance it
later with btrfs specifics.

Are you ok with the proposed changes? (I don't want to bother with
resending for simple changes.)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: add supported attr flags to btrfs(5)

2014-06-27 Thread Eric Sandeen
On 6/27/14, 8:42 AM, David Sterba wrote:
 On Thu, Jun 26, 2014 at 03:38:33PM -0500, Eric Sandeen wrote:
 +FILE ATTRIBUTES
 +---
 +The btrfs filesystem supports setting the following file
 +attributes the `chattr`(1) utility
 +append only (a), no atime updates (A), compressed (c), no copy on write (C),
 +no dump (d), synchronous directory updates (d), immutable (i),
 +synchronous updates (S), and no compression (X).
 
 The formatting is not eye-pleasing.
 
 I've spotted a few mistakes:
 
 * 'd' is listed twice, for sync directory updates it's 'D'

Crud, sorry about that.

 * and 'X' does not mean no compression and never has, although I'd
   like to see a chattr bit for that because we have the corresponding
   inode bit

Ok, then I'm not sure what it does mean.  Supposedly these flags are supported;
via check_flags(), called by setflags(), which I was basing these on:

if (flags  ~(FS_IMMUTABLE_FL | FS_APPEND_FL | \
  FS_NOATIME_FL | FS_NODUMP_FL | \
  FS_SYNC_FL | FS_DIRSYNC_FL | \
  FS_NOCOMP_FL | FS_COMPR_FL |
  FS_NOCOW_FL))

and the kernel header says that's:

#define FS_NOCOMP_FL0x0400 /* Don't compress */

chattr(1) says: compression raw access (X), and also The ’X’ attribute
is used by the experimental compression patches to indicate that a raw
contents of a compressed file  can  be  accessed  directly.  It currently 
may not be set or reset using chattr(1), although it can be displayed by 
lsattr(1).

Hum, ok, so we are starting to go off the rails here, aren't we ;)

e2fsprogs has this flag translation:
 { EXT2_NOCOMPR_FL, X, Compression_Raw_Access },
for:
#define EXT2_NOCOMPR_FL 0x0400 /* Access raw compressed 
data */

and btrfs_ioctl_setflags claims to handle it:

if (flags  FS_NOCOMP_FL) {
ip-flags = ~BTRFS_INODE_COMPRESS;
ip-flags |= BTRFS_INODE_NOCOMPRESS;
...

so hopefully you can understand my confusion? ;)

The comment says:

 * The COMPRESS flag can only be changed by users, while the NOCOMPRESS
 * flag may be changed automatically if compression code won't make
 * things smaller.

(but doesn't says may *only* be...)

But OTOH, chattr won't ever even *pass* X to the fs, will it.

So I guess I'm lost.  It looks like there's code to handle an incoming
X but I don't think chattr will send it.

Do we ever get an outbound X for an opportunistically not-compressed file?
If so, maybe that still needs to be specified.

Otherwise, yeah, the *format* changes look great, thanks. ;)

-Eric


 I've checked your patches, the meaning of 'X' hasn't changed.
 
 I took the opportunity and reformated the options:
 
 @@ -183,9 +183,24 @@ FILE ATTRIBUTES
  ---
  The btrfs filesystem supports setting the following file
  attributes the `chattr`(1) utility
 -append only (a), no atime updates (A), compressed (c), no copy on write (C),
 -no dump (d), synchronous directory updates (d), immutable (i),
 -synchronous updates (S), and no compression (X).
 +
 +*a* -- append only
 +
 +*A* -- no atime updates
 +
 +*c* -- compressed
 +
 +*C* -- no copy on write
 +
 +*d* -- no dump
 +
 +*D* -- synchronous directory updates
 +
 +*i* -- immutable
 +
 +*S* -- synchronous updates
 
  For descriptions of these attribute flags, please refer to the
  `chattr`(1) man page.
 ---
 
 looks almost the same in the manpage and gives IMO a good
 overview. For initial patch I'm ok with the descriptions, we can enhance it
 later with btrfs specifics.
 
 Are you ok with the proposed changes? (I don't want to bother with
 resending for simple changes.)
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Blocked tasks on 3.15.1

2014-06-27 Thread Rich Freeman
On Fri, Jun 27, 2014 at 9:06 AM, Duncan 1i5t5.dun...@cox.net wrote:
 Hopefully that problem's fixed on 3.16-rc2+, but as of yet there's not
 enough 3.16-rc2+ reports out there from folks experiencing issues with
 3.15 blocked tasks to rightfully say.

Any chance that it was backported to 3.15.2?  I'd rather not move to
mainline just for btrfs.

I got another block this morning and failed to capture a log before my
terminals gave out.  I switched back to 3.15.0 for the moment, and
we'll see if that fares any better.

Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: add supported attr flags to btrfs(5)

2014-06-27 Thread Eric Sandeen
On 6/27/14, 10:30 AM, David Sterba wrote:
 On Fri, Jun 27, 2014 at 09:56:10AM -0500, Eric Sandeen wrote:
 * and 'X' does not mean no compression and never has, although I'd
   like to see a chattr bit for that because we have the corresponding
   inode bit

 Ok, then I'm not sure what it does mean.  Supposedly these flags are 
 supported;
 via check_flags(), called by setflags(), which I was basing these on:

 if (flags  ~(FS_IMMUTABLE_FL | FS_APPEND_FL | \
   FS_NOATIME_FL | FS_NODUMP_FL | \
   FS_SYNC_FL | FS_DIRSYNC_FL | \
   FS_NOCOMP_FL | FS_COMPR_FL |
   FS_NOCOW_FL))

 and the kernel header says that's:

 #define FS_NOCOMP_FL0x0400 /* Don't compress */
 
 Passing this bit directly via ioctl works as expected, but to my
 knowledge there is no chattr letter allocated for it.

it's in the manpage, but as a read-only attribute, i.e. lsattr only.

 chattr(1) says: compression raw access (X), and also The ’X’ attribute
 is used by the experimental compression patches to indicate that a raw
 contents of a compressed file  can  be  accessed  directly.  It currently 
 may not be set or reset using chattr(1), although it can be displayed by 
 lsattr(1).

 Hum, ok, so we are starting to go off the rails here, aren't we ;)
 
 Yeah. And there's no support for accessing raw compressed data.
 
 e2fsprogs has this flag translation:
  { EXT2_NOCOMPR_FL, X, Compression_Raw_Access },
 for:
 #define EXT2_NOCOMPR_FL 0x0400 /* Access raw compressed 
 data */

 and btrfs_ioctl_setflags claims to handle it:

 if (flags  FS_NOCOMP_FL) {
 ip-flags = ~BTRFS_INODE_COMPRESS;
 ip-flags |= BTRFS_INODE_NOCOMPRESS;
  ...

 so hopefully you can understand my confusion? ;)
 
 Oh I do now, but it's ext2 fault :)

Ok but btrfs setflags tries to handle FS_NOCOMP_FL - how is that ever set?

 The comment says:

  * The COMPRESS flag can only be changed by users, while the 
 NOCOMPRESS
  * flag may be changed automatically if compression code won't make
  * things smaller.

 (but doesn't says may *only* be...)
 
 And thats IMO right (at least I expect it work like that), the user may
 set or drop the NOCOMPRESS flag. The comment says that it may appear
 without user interaction.
 
 But OTOH, chattr won't ever even *pass* X to the fs, will it.

 So I guess I'm lost.  It looks like there's code to handle an incoming
 X but I don't think chattr will send it.

 Do we ever get an outbound X for an opportunistically not-compressed file?
 If so, maybe that still needs to be specified.
 
 AFAICS 'X' is not listed among the standard chattr options and chattr.c
 in e2fsprogs has no support for that.
 
 There is
 
 lib/e2p/pf.c:   { EXT2_NOCOMPR_FL, X, Compression_Raw_Access },
 
 but this is used only locally by print_flags.

Right, it's a read-only flag for lsattr.

 I hope this answers your questions, 'X' has no meaning for btrfs now.

The only remaining question is, why does the btrfs setflags interface
try to parse it, if nothing sends it?  Or if something does send it,
what?  And where is this all documented? ;)

btrfs tries to handle a flag value which is identical to the
'X' flag value, which lsattr/chattr says is readonly...

-Eric
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Question] Btrfs on iSCSI device

2014-06-27 Thread Zhe Zhang
Hi,

I setup 2 Linux servers to share the same device through iSCSI. Then I
created a btrfs on the device. Then I saw the problem that the 2 Linux
servers do not see a consistent file system image.

Details:
-- Server 1 running kernel 2.6.32, server 2 running 3.2.1
-- Both running btrfs v0.20-rc1
-- Server 2 has device /dev/vdc, exposed as iSCSI target
 -- Server 1 mounts the device as /dev/sda
-- Server 1 'mount /dev/sda /mnt/btrfs'; server 2 'mount /dev/vdc /mnt/btrfs',
 -- When server 1 'touch /mnt/btrfs/foo', server 2 doesn't see any
file under /mnt/btrfs
-- I created /mnt/btrfs/foo on server 2 as well; then I added some
content from both server 1 and server 2 to /mnt/btrfs/foo
-- After that each server sees the content it adds, but not the
content from the other server
-- Both server 'umount /mnt/btrfs', and mount it again
-- Then both servers see /mnt/btrfs/foo with the content added from
server 2 (I guess it's because server 2 created the foo file later
than server 1).

I did a similar test on ext4 and both servers see a consistent image
of the file system. When server 1 creates a foo file server 2
immediately sees it.

Is this how btrfs is supposed to work?

Thanks,

Zhe
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Blocked tasks on 3.15.1

2014-06-27 Thread Chris Murphy

On Jun 27, 2014, at 9:14 AM, Rich Freeman r-bt...@thefreemanclan.net wrote:

 On Fri, Jun 27, 2014 at 9:06 AM, Duncan 1i5t5.dun...@cox.net wrote:
 Hopefully that problem's fixed on 3.16-rc2+, but as of yet there's not
 enough 3.16-rc2+ reports out there from folks experiencing issues with
 3.15 blocked tasks to rightfully say.
 
 Any chance that it was backported to 3.15.2?  I'd rather not move to
 mainline just for btrfs.

The backports don't happen that quickly. I'm uncertain about specifics but I 
think many such fixes need to be demonstrated in mainline before they get 
backported to stable.


 
 I got another block this morning and failed to capture a log before my
 terminals gave out.  I switched back to 3.15.0 for the moment, and
 we'll see if that fares any better. 

Yeah I'd start going backwards. The idea of going forwards is to hopefully get 
you unstuck or extract data where otherwise you can't, it's not really a 
recommendation for production usage. It's also often useful if you can 
reproduce the block with a current rc kernel and issue sysrq+w and post that. 
Then do your regression with an older kernel.

Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Question] Btrfs on iSCSI device

2014-06-27 Thread Goffredo Baroncelli
Hi,
On 06/27/2014 05:44 PM, Zhe Zhang wrote:
 Hi,
 
 I setup 2 Linux servers to share the same device through iSCSI. Then I
 created a btrfs on the device. Then I saw the problem that the 2 Linux
 servers do not see a consistent file system image.
 
 Details:
 -- Server 1 running kernel 2.6.32, server 2 running 3.2.1
 -- Both running btrfs v0.20-rc1
 -- Server 2 has device /dev/vdc, exposed as iSCSI target
  -- Server 1 mounts the device as /dev/sda
 -- Server 1 'mount /dev/sda /mnt/btrfs'; server 2 'mount /dev/vdc /mnt/btrfs',
  -- When server 1 'touch /mnt/btrfs/foo', server 2 doesn't see any
 file under /mnt/btrfs
 -- I created /mnt/btrfs/foo on server 2 as well; then I added some
 content from both server 1 and server 2 to /mnt/btrfs/foo
 -- After that each server sees the content it adds, but not the
 content from the other server
 -- Both server 'umount /mnt/btrfs', and mount it again
 -- Then both servers see /mnt/btrfs/foo with the content added from
 server 2 (I guess it's because server 2 created the foo file later
 than server 1).
 
 I did a similar test on ext4 and both servers see a consistent image
 of the file system. When server 1 creates a foo file server 2
 immediately sees it.
 
 Is this how btrfs is supposed to work?

I don't think that it is possible to mount the _same device_ at the _same time_ 
on two different machines. And this doesn't depend by the filesystem.

The fact that you see it working, I suspect that is is casual.

When I tried this (same scsi HD connected to two machines), I had to ensure 
that the two machines never accessed to the HD at the same time.

 
 Thanks,
 
 Zhe
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: add supported attr flags to btrfs(5)

2014-06-27 Thread Eric Sandeen
On 6/27/14, 11:10 AM, David Sterba wrote:
 On Fri, Jun 27, 2014 at 10:36:54AM -0500, Eric Sandeen wrote:

...

 btrfs tries to handle a flag value which is identical to the
 'X' flag value, which lsattr/chattr says is readonly...
 
 I'm looking at it from the kernel side, ie what's its meaning of the
 flag. The chattr tool still lives under the hood of e2fsprogs, but
 the ioctl interface is inherited to other filesystems (stating the
 obvious). e2fsprogs/chattr can decide to implement other meaning or new
 bits more or less freely (eg. there's the new 'N' flag for inlined files
 that I discovered just today while exploring the 'X' flag).

Yes, the interface originated w/ extN, but has clearly spread to other
filesystems, and spread like a weed.  ;)  It's still the de facto
interface, but looking through other filesystems, it's a bit of a disaster.
(filesystems specifying inheritance of flags they ignore, for example).

 There was a discussion at fsdevel about extending the interface or
 reworking it completely, I don't know if there's an outcome.
 
 From the btrfs side, we have the object properties that make a nice
 interface for accessing the file attributes in parallel with the chattr
 tool. The interface is currently underused so it's not possible to
 manipulate the flags yet.

or test the code, despite it being merged.  \o/  oh well...

 I'd rather move the efforts to finalize this interface than adding
 single bits of the SETFLAGS ioctl and further extensions of the
 chattr/lsattr tools.

ok.  In any case, back to the original patch:  Your changes look fine.
'X' can't be set, so leave it out.  Sorry about the 'd' vs 'D' - and I
like the new formatting.  Feel free to make those changes.

(only nitpick: is 'X' ever reported by lsattr on btrfs?  If so,
it could/should still be included)

((and a side note: I tried to change the text of the manpage to
be btrfs not btrfs-mount but that somehow broke the build, and
I didn't dig a lot further)) 

-Eric

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Question] Btrfs on iSCSI device

2014-06-27 Thread Austin S Hemmelgarn
On 2014-06-27 12:34, Goffredo Baroncelli wrote:
 Hi,
 On 06/27/2014 05:44 PM, Zhe Zhang wrote:
 Hi,

 I setup 2 Linux servers to share the same device through iSCSI. Then I
 created a btrfs on the device. Then I saw the problem that the 2 Linux
 servers do not see a consistent file system image.

 Details:
 -- Server 1 running kernel 2.6.32, server 2 running 3.2.1
 -- Both running btrfs v0.20-rc1
 -- Server 2 has device /dev/vdc, exposed as iSCSI target
  -- Server 1 mounts the device as /dev/sda
 -- Server 1 'mount /dev/sda /mnt/btrfs'; server 2 'mount /dev/vdc 
 /mnt/btrfs',
  -- When server 1 'touch /mnt/btrfs/foo', server 2 doesn't see any
 file under /mnt/btrfs
 -- I created /mnt/btrfs/foo on server 2 as well; then I added some
 content from both server 1 and server 2 to /mnt/btrfs/foo
 -- After that each server sees the content it adds, but not the
 content from the other server
 -- Both server 'umount /mnt/btrfs', and mount it again
 -- Then both servers see /mnt/btrfs/foo with the content added from
 server 2 (I guess it's because server 2 created the foo file later
 than server 1).

 I did a similar test on ext4 and both servers see a consistent image
 of the file system. When server 1 creates a foo file server 2
 immediately sees it.

 Is this how btrfs is supposed to work?
 
 I don't think that it is possible to mount the _same device_ at the _same 
 time_ on two different machines. And this doesn't depend by the filesystem.
 
 The fact that you see it working, I suspect that is is casual.
 
 When I tried this (same scsi HD connected to two machines), I had to ensure 
 that the two machines never accessed to the HD at the same time.
 

 Thanks,

 Zhe
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 
 
If you need shared storage like that, you need to use a real cluster
filesystem like GFS2 or OCFS2, BTRFS isn't designed for any kind of
concurrent access to shared storage from separate systems.
The reason it appears to work when using iSCSI and not with directly
connected parallel SCSI or SAS is that iSCSI doesn't provide low level
hardware access.



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Blocked tasks on 3.15.1

2014-06-27 Thread Duncan
Chris Murphy posted on Fri, 27 Jun 2014 09:52:46 -0600 as excerpted:

 On Jun 27, 2014, at 9:14 AM, Rich Freeman r-bt...@thefreemanclan.net
 wrote:
 
 On Fri, Jun 27, 2014 at 9:06 AM, Duncan 1i5t5.dun...@cox.net wrote:
 Hopefully that problem's fixed on 3.16-rc2+, but as of yet there's not
 enough 3.16-rc2+ reports out there from folks experiencing issues with
 3.15 blocked tasks to rightfully say.
 
 Any chance that it was backported to 3.15.2?  I'd rather not move to
 mainline just for btrfs.
 
 The backports don't happen that quickly.

The lockup bug that affected early 3.16 was introduced in the commit-
window pull for 3.16, so the fix for that shouldn't have needed backported 
(unless the problem commit ended up in stable too, which I doubt but 
don't know for sure).

3.15.0 didn't contain that bug, which affected me, but as I said, there 
did seem to be more blocked-task reports in 3.15, which didn't affect me.

I didn't run 3.15.1, however, staying on 3.15.0 until after 3.16-rc2 
fixed the earlier 3.16-pre series bug that had kept me from the 3.16 
series until then.  So anything that might have affected the 3.15 stable 
series after 3.15.0, I wouldn't know about.

If I'm not mistaken the fix for the 3.16 series bug was:

ea4ebde02e08558b020c4b61bb9a4c0fcf63028e

Btrfs: fix deadlocks with trylock on tree nodes.

But I think the 3.16 commit-window changes introducing the bug weren't 
btrfs specific but instead at the generic vfs level.  If that's the case, 
then it's possible that the bug was there before 3.16's commit window and 
might have been triggering some of the 3.15 reports as well, and the 3.16 
vfs change simply made it much worse.

IOW, I don't know whether that 3.16 series fix will help 3.15 or not, but 
I don't believe it'll hurt, and it /might/ help.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Question] Btrfs on iSCSI device

2014-06-27 Thread Zhe Zhang
On Fri, Jun 27, 2014 at 1:15 PM, Austin S Hemmelgarn
ahferro...@gmail.com wrote:
 On 2014-06-27 12:34, Goffredo Baroncelli wrote:
 Hi,
 On 06/27/2014 05:44 PM, Zhe Zhang wrote:
 Hi,

 I setup 2 Linux servers to share the same device through iSCSI. Then I
 created a btrfs on the device. Then I saw the problem that the 2 Linux
 servers do not see a consistent file system image.

 Details:
 -- Server 1 running kernel 2.6.32, server 2 running 3.2.1
 -- Both running btrfs v0.20-rc1
 -- Server 2 has device /dev/vdc, exposed as iSCSI target
  -- Server 1 mounts the device as /dev/sda
 -- Server 1 'mount /dev/sda /mnt/btrfs'; server 2 'mount /dev/vdc 
 /mnt/btrfs',
  -- When server 1 'touch /mnt/btrfs/foo', server 2 doesn't see any
 file under /mnt/btrfs
 -- I created /mnt/btrfs/foo on server 2 as well; then I added some
 content from both server 1 and server 2 to /mnt/btrfs/foo
 -- After that each server sees the content it adds, but not the
 content from the other server
 -- Both server 'umount /mnt/btrfs', and mount it again
 -- Then both servers see /mnt/btrfs/foo with the content added from
 server 2 (I guess it's because server 2 created the foo file later
 than server 1).

 I did a similar test on ext4 and both servers see a consistent image
 of the file system. When server 1 creates a foo file server 2
 immediately sees it.

 Is this how btrfs is supposed to work?

 I don't think that it is possible to mount the _same device_ at the _same 
 time_ on two different machines. And this doesn't depend by the filesystem.

 The fact that you see it working, I suspect that is is casual.

 When I tried this (same scsi HD connected to two machines), I had to ensure 
 that the two machines never accessed to the HD at the same time.


 Thanks,

 Zhe
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



 If you need shared storage like that, you need to use a real cluster
 filesystem like GFS2 or OCFS2, BTRFS isn't designed for any kind of
 concurrent access to shared storage from separate systems.
 The reason it appears to work when using iSCSI and not with directly
 connected parallel SCSI or SAS is that iSCSI doesn't provide low level
 hardware access.


I did more testing with ext4 and it supports what Goffredo and Austin
said above. Error message is cannot access xxx: Input/output error.

It seems to me that both servers hold some file system data structures
in memory and eventually conflict with each other (like writing inode
info to the same blocks).
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Blocked tasks on 3.15.1

2014-06-27 Thread Rich Freeman
On Fri, Jun 27, 2014 at 11:52 AM, Chris Murphy li...@colorremedies.com wrote:
 On Jun 27, 2014, at 9:14 AM, Rich Freeman r-bt...@thefreemanclan.net wrote:


 I got another block this morning and failed to capture a log before my
 terminals gave out.  I switched back to 3.15.0 for the moment, and
 we'll see if that fares any better.

 Yeah I'd start going backwards. The idea of going forwards is to
 hopefully get you unstuck or extract data where otherwise you can't,
 it's not really a recommendation for production usage. It's also often
 useful if you can reproduce the block with a current rc kernel and
 issue sysrq+w and post that. Then do your regression with an older
 kernel.

So, obviously I'm getting my money's worth from the btrfs team, but
neither is always a great option as neither involves me running a
stable kernel.  3.15.0 contains CVE-2014-4014, although I'm running a
version patched for that vulnerability.  If I go back any further I'd
probably have to backport it myself, and I only know about it because
my distro patched that CVE on 3.15.0 before moving to 3.15.1.

Running 3.16 doesn't bother me much from a btrfs standpoint, but it
means I'm getting unstable updates on all the other modules as well.
It is just more to deal with.

I might give 3.15.2 a shot and see what happens, and I can always fall
back to 3.15.0 again.

Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cannot delete snapshot

2014-06-27 Thread m...@gmx.net
Hello, =)

I got a problem with a simple backup bash script. It creates a snapshot and 
then backs it up.
The user (Ubuntu 12.04, 64-bit) interrupted the script with CTRL+C shortly 
after it started.
Then the machine was rebooted several times. Now these snapshots cannot be 
deleted anymore
and new ones can't be taken.


This is what the script does before it starts the backup process:

mount /mnt/big -o remount -o rw # /mnt/big is a partition for backups, which is 
read-only
mkdir /mnt/backup_root
mount /dev/sda1 /mnt/backup_root # /dev/sda1 on /home type btrfs 
(rw,subvol=@home)
btrfs subvolume snapshot /mnt/backup_root/@ 
/mnt/backup_root/@snapshot_backup_root
btrfs subvolume snapshot /mnt/backup_root/@home 
/mnt/backup_root/@snapshot_backup_home
mkdir /mnt/backup
mount --bind /mnt/backup_root/@snapshot_backup_root /mnt/backup/
mount --bind /mnt/backup_root/@snapshot_backup_home/ /mnt/backup/home/
mount --bind /boot /mnt/backup/boot/

This is what it didn't do, as it was interrupted:
umount /mnt/backup/boot/ 1 /dev/null 2 /dev/null
umount /mnt/backup/home/ 1 /dev/null 2 /dev/null
umount /mnt/backup 1 /dev/null 2 /dev/null
rmdir /mnt/backup 1 /dev/null 2 /dev/null
btrfs subvolume delete /mnt/backup_root/@snapshot_backup_home 1 /dev/null 2 
/dev/null
btrfs subvolume delete /mnt/backup_root/@snapshot_backup_root 1 /dev/null 2 
/dev/null
umount /mnt/backup_root 1 /dev/null 2 /dev/null
rmdir /mnt/backup_root 1 /dev/null 2 /dev/null
mount /mnt/big -o remount -o ro 1 /dev/null 2 /dev/null


This is what the file-system is supposed to look like:
# btrfs subvolume list '/'
ID 256 top level 5 path @
ID 257 top level 5 path @home

and this is what it looks like instead:
# btrfs subvolume list '/'
ID 256 top level 5 path @
ID 257 top level 5 path @home
ID 258 top level 5 path @snapshot_backup_root
ID 259 top level 5 path @snapshot_backup_home
ID 260 top level 5 path @snapshot_backup_root/@
ID 261 top level 5 path @snapshot_backup_home/@home
# btrfs subvolume list '/mnt'
ERROR: '/mnt' is not a subvolume
# btrfs subvolume list '/mnt/backup_root'
ERROR: error accessing '/mnt/backup_root'


When running the script again, it prints:
ERROR: cannot snapshot '/mnt/backup_root/@'
and
ERROR: cannot snapshot '/mnt/backup_root/@home'
as they still seem to exist.

But deleting the snapshots fails as well.

btrfs subvolume delete  '/@snapshot_backup_root'
btrfs subvolume delete  '/@snapshot_backup_home'
btrfs subvolume delete  '/@snapshot_backup_root/@'
btrfs subvolume delete  '/@snapshot_backup_home/@home'
btrfs subvolume delete  '/@snapshot_backup_root'
btrfs subvolume delete  '/@snapshot_backup_home'

didn't work (ERROR: error accessing '/@snapshot_backup_root'.
Simply running the unmount/delete snapshot part of the script didn't either.

How can I get rid of those snapshots (ID 257-261)?

I tried:
# mkdir /tmp/t
# mount /dev/sda1 /tmp/t  -o subvol=/
# ls /tmp/t

@
@home
@snapshot_backup_home
@snapshot_backup_root

# btrfs subvol delete /tmp/t/@snapshot_backup_home
Delete subvolume '/tmp/t/@snapshot_backup_home'
ERROR: cannot delete '/tmp/t/@snapshot_backup_home'


Needless to say that such an error message isn't really helpful.


I'd appreciate any help. As I'm NOT SUBSCRIBED TO THE MAILING LIST,
please CC to
  m...@gmx.net

Thank you very much! :-)



PS.: I didn't append any dmesg and such as they likely won't be of any use,
 given the very specific problem. If needed I can supply them.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Also seeing full deadlocks with 3.15.1

2014-06-27 Thread Marc MERLIN
My laptop deadlocked some more times (everything works until it needs to
touch the filesystem, and then it's deadlocked).
Unfortunately, I can trigger sysrq, but it doesn't get committed to disk and
netconsole eats half of it because it goes too fast for UDP apparently

Now, I just captured that on my server with serial console.

11005  1-16:11:10 wait_current_trans.isra.15 /usr/bin/zma -m 3
14441  1-16:07:44 wait_current_trans.isra.15 /usr/bin/zma -m 1
17045  1-23:53:33 wait_current_trans.isra.15 /usr/bin/zma -m 9
22261  2-00:40:36 wait_current_trans.isra.15 /usr/bin/zma -m 6
22292  2-00:40:36 wait_current_trans.isra.15 /usr/bin/zma -m 8

1991109:29:35 wait_current_trans.isra.15 rm -f -- 
/mnt/dshelf2/backup/0Notmachines/mysql//mysql.daily.sql.gz.13 
/mnt/dshelf2/backup/0Notmachines/mysql//mysql.daily.sql.gz.13.gz
22848  1-05:18:35 wait_current_trans.isra.15 rm -f -- 
mnt/dshelf2/backup/0Notmachines/jen//backup.tar.bz.11 
mnt/dshelf2/backup/0Notmachines/jen//backup.tar.bz.11.gz

Those are 2 different filesystems (one single device mapper disk, the other one 
is btrfs raid1), so I'm not sure which one of the 2 caused the problem, but I'm 
perplexed as to why one would than hang the other, unless they both hit the 
same bug?

The sysrq-w output is here:
http://marc.merlins.org/tmp/btrfs-hang.txt

but here is one hung process:
 zmaD 0003 0 22292  1 0x20020084
  880074733bb0 0082 8800c933f270 880074733fd8
  8801853b4610 000141c0 8801aac60f00 880036caa9e8
   880036caa800 8801db59f0c0 880074733bc0
 Call Trace:
  [8161d3c6] schedule+0x73/0x75
  [8122a87b] wait_current_trans.isra.15+0x98/0xf4
  [810847ed] ? finish_wait+0x65/0x65
  [8122bd95] start_transaction+0x498/0x4fc
  [8122be14] btrfs_start_transaction+0x1b/0x1d
  [8123602a] btrfs_create+0x3c/0x1ce
  [81298985] ? security_inode_permission+0x1c/0x23
  [8115e93e] ? __inode_permission+0x79/0xa4
  [8115fbfc] vfs_create+0x66/0x8c
  [8116095e] do_last+0x5af/0xa23
  [81161009] path_openat+0x237/0x4de
  [81162408] do_filp_open+0x3a/0x7f
  [8161faeb] ? _raw_spin_unlock+0x17/0x2a
  [8116c3eb] ? __alloc_fd+0xea/0xf9
  [8115499d] do_sys_open+0x70/0xff
  [81194e20] compat_SyS_open+0x1b/0x1d
  [8162842c] sysenter_dispatch+0x7/0x21

As per the other thread, I'm happy to test a patch against 3.15, but not hot 
about switching to a likely even less stable 3.16 since it's a real server with 
real data.

Thanks,
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


nossd option ignored

2014-06-27 Thread Roman Mamedov
Hello,

With kernel 3.14.5...

$ sudo umount /mnt/net/alpha/11
umount: /mnt/net/alpha/11: not mounted

$ sudo mount -o inode_cache,space_cache,compress=lzo,noatime,nossd,skip_balance 
/dev/nbd11 /mnt/net/alpha/11

$ sudo mount | grep nbd11
/dev/nbd11 on /mnt/net/alpha/11 type btrfs 
(rw,noatime,compress=lzo,ssd,space_cache,inode_cache,skip_balance)

$ dmesg | tail
...
[1353819.363462] BTRFS: device fsid 8cf8eff9-fd5a-4b6f-bb85-3f2df2f63c99 devid 
1 transid 25041 /dev/nbd11
[1353819.364668] BTRFS info (device nbd11): enabling inode map caching
[1353819.364674] BTRFS info (device nbd11): disk space caching is enabled
[1353821.784617] BTRFS: detected SSD devices, enabling SSD mode

--
I'm certain the nossd option used to work (prevent the SSD mode) with this
exact same configuration on older kernels. Any idea why it doesn't now?

-- 
With respect,
Roman


signature.asc
Description: PGP signature


[PATCH] btrfs: fix nossd and ssd_spread mount option regression

2014-06-27 Thread Eric Sandeen
The commit

0780253 btrfs: Cleanup the btrfs_parse_options for remount.

broke ssd options quite badly; it stopped making ssd_spread
imply ssd, and it made nossd unsettable.

Put things back at least as well as they were before
(though ssd mount option handling is still pretty odd:
# mount -o nossd,ssd_spread works?)

Reported-by: Roman Mamedov r...@romanrm.net
Signed-off-by: Eric Sandeen sand...@redhat.com
---

I've tested this insofar as I was actually able to mount with
nossd,and see it reflected in /proc/mounts.

If SSD_SPREAD is set, show_options() won't show you the ssd
option, so that's not totally obvious.  Still, this is what
the code did before the regression.

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 4662d92..0e8edcc 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -522,9 +522,10 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
case Opt_ssd_spread:
btrfs_set_and_info(root, SSD_SPREAD,
   use spread ssd allocation scheme);
+   btrfs_set_opt(info-mount_opt, SSD);
break;
case Opt_nossd:
-   btrfs_clear_and_info(root, NOSSD,
+   btrfs_set_and_info(root, NOSSD,
 not using ssd allocation scheme);
btrfs_clear_opt(info-mount_opt, SSD);
break;

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can't mount subvolume with ro option

2014-06-27 Thread Chris Murphy

On Jun 27, 2014, at 2:07 PM, Sébastien ROHAUT sebastien.roh...@free.fr wrote:

 Hi,
 
 In the wiki, it's said we can mount subvolumes with different mount options. 
 nosuid, nodev, rw and ro are listed, as valid generic mount options.

This might require 3.15. I don't recall it working with early 3.14 kernels, but 
by 3.14.3 I'd moved onto testing 3.15.

[root@rawhide ~]# mount /dev/sda3 /mnt
[root@rawhide ~]# btrfs subvol create /mnt/test
Create subvolume '/mnt/test'
[root@rawhide ~]# umount /mnt
[root@rawhide ~]# mount -o ro,subvol=test /dev/sda3 /mnt
[root@rawhide ~]# mount | grep btrfs
/dev/sda3 on / type btrfs (rw,relatime,seclabel,space_cache,autodefrag)
/dev/sda3 on /home type btrfs (rw,relatime,seclabel,space_cache,autodefrag)
/dev/sda3 on /var type btrfs (rw,relatime,seclabel,space_cache,autodefrag)
/dev/sda3 on /boot type btrfs (rw,relatime,seclabel,space_cache,autodefrag)
/dev/sda3 on /mnt type btrfs (ro,relatime,seclabel,space_cache,autodefrag)
[root@rawhide ~]# cat /proc/self/mountinfo | grep btrfs
58 0 0:33 /root / rw,relatime shared:1 - btrfs /dev/sda3 
rw,seclabel,space_cache,autodefrag
72 58 0:33 /home /home rw,relatime shared:29 - btrfs /dev/sda3 
rw,seclabel,space_cache,autodefrag
74 58 0:33 /var /var rw,relatime shared:30 - btrfs /dev/sda3 
rw,seclabel,space_cache,autodefrag
76 58 0:33 /boot /boot rw,relatime shared:31 - btrfs /dev/sda3 
rw,seclabel,space_cache,autodefrag
84 58 0:33 /test /mnt ro,relatime shared:35 - btrfs /dev/sda3 
rw,seclabel,space_cache,autodefrag

So on my end it seems like it's working correctly with 3.15.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can't mount subvolume with ro option

2014-06-27 Thread Chris Murphy

On Jun 27, 2014, at 4:08 PM, Chris Murphy li...@colorremedies.com wrote:

 
 On Jun 27, 2014, at 2:07 PM, Sébastien ROHAUT sebastien.roh...@free.fr 
 wrote:
 
 Hi,
 
 In the wiki, it's said we can mount subvolumes with different mount options. 
 nosuid, nodev, rw and ro are listed, as valid generic mount options.
 
 This might require 3.15. I don't recall it working with early 3.14 kernels, 
 but by 3.14.3 I'd moved onto testing 3.15.


[root@f20v ~]# mount /dev/sda3 /mnt
[root@f20v ~]# btrfs subvol create /mnt/test
Create subvolume '/mnt/test'
[root@f20v ~]# umount /mnt
[root@f20v ~]# mount -o ro,subvol=test /dev/sda3 /mnt
mount: /dev/sda3 is already mounted or /mnt busy
   /dev/sda3 is already mounted on /
   /dev/sda3 is already mounted on /home
   /dev/sda3 is already mounted on /var
   /dev/sda3 is already mounted on /boot
[root@f20v ~]# uname -r
3.14.6-200.fc20.x86_64


I don't know if this feature will be backported to stable kernels. If not, then 
probably the wiki should say it's a 3.15+ feature.

Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Also seeing full deadlocks with 3.15.1

2014-06-27 Thread Marc MERLIN
On Fri, Jun 27, 2014 at 02:50:10PM -0700, ronnie sahlberg wrote:
  If I don't hear anything by the end of today, I'll just delete the
  filesystem and start over.
 
 At some stage it would be nice to see not only fixes but also changes
 to fsck to make it able to repair these problems.
 Blow it away and create a new filesystem from scratch is sub-optimal.

I don't think you'll find disagreement from me or anyone here :)

But I'd go one step further. The filesystem is not corrupted as far as I
can tell, I'm happily copying data off it in ro,recovery mode (to
prevent background btrfs code from trying to do stuff and trip over
itself again).

The problem in my experience so far is that btrfs isn't stabilizing at
all. Some bugs are fixed, other things are changed, and new ones are
added.
I've not had a single version of btrfs in the last 4 kernels that didn't
deadlock and/or trip over itself (apparently from evolving or
balancing/filling filesystems into states where it can't handle them
properly anymore).

I really really wish we had a kernel release with only stabilizations
and where all recent deadlock and corruption problems (on newly created
filesystems) would be handled.
Right now, general state is bad enough that you can't tell if you hit a
new bug, or if it's an old bug that hasn't been fixed yet and developers
can't easily know if newer kernels have introduced regressions or not
since the general state of performance and stability isn't good across
all recent kernel versions.

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Also seeing full deadlocks with 3.15.1

2014-06-27 Thread Josef Bacik

On 06/27/2014 11:50 AM, Marc MERLIN wrote:

My laptop deadlocked some more times (everything works until it needs to
touch the filesystem, and then it's deadlocked).
Unfortunately, I can trigger sysrq, but it doesn't get committed to disk and
netconsole eats half of it because it goes too fast for UDP apparently

Now, I just captured that on my server with serial console.

11005  1-16:11:10 wait_current_trans.isra.15 /usr/bin/zma -m 3
14441  1-16:07:44 wait_current_trans.isra.15 /usr/bin/zma -m 1
17045  1-23:53:33 wait_current_trans.isra.15 /usr/bin/zma -m 9
22261  2-00:40:36 wait_current_trans.isra.15 /usr/bin/zma -m 6
22292  2-00:40:36 wait_current_trans.isra.15 /usr/bin/zma -m 8

1991109:29:35 wait_current_trans.isra.15 rm -f -- 
/mnt/dshelf2/backup/0Notmachines/mysql//mysql.daily.sql.gz.13 
/mnt/dshelf2/backup/0Notmachines/mysql//mysql.daily.sql.gz.13.gz
22848  1-05:18:35 wait_current_trans.isra.15 rm -f -- 
mnt/dshelf2/backup/0Notmachines/jen//backup.tar.bz.11 
mnt/dshelf2/backup/0Notmachines/jen//backup.tar.bz.11.gz

Those are 2 different filesystems (one single device mapper disk, the other one 
is btrfs raid1), so I'm not sure which one of the 2 caused the problem, but I'm 
perplexed as to why one would than hang the other, unless they both hit the 
same bug?

The sysrq-w output is here:
https://urldefense.proofpoint.com/v1/url?u=http://marc.merlins.org/tmp/btrfs-hang.txtk=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0Am=CZ0ka0XcM6ZpRAF31LYBziutfoecu9ODO78jo5Kb2JQ%3D%0As=6213c6dc2c99166a71f262a1804bc7135ca17bffd8b9de175f655ed2a6a54f10

but here is one hung process:
  zma   D 0003 0 22292  1 0x20020084
   880074733bb0 0082 8800c933f270 880074733fd8
   8801853b4610 000141c0 8801aac60f00 880036caa9e8
    880036caa800 8801db59f0c0 880074733bc0
  Call Trace:
   [8161d3c6] schedule+0x73/0x75
   [8122a87b] wait_current_trans.isra.15+0x98/0xf4
   [810847ed] ? finish_wait+0x65/0x65
   [8122bd95] start_transaction+0x498/0x4fc
   [8122be14] btrfs_start_transaction+0x1b/0x1d
   [8123602a] btrfs_create+0x3c/0x1ce
   [81298985] ? security_inode_permission+0x1c/0x23
   [8115e93e] ? __inode_permission+0x79/0xa4
   [8115fbfc] vfs_create+0x66/0x8c
   [8116095e] do_last+0x5af/0xa23
   [81161009] path_openat+0x237/0x4de
   [81162408] do_filp_open+0x3a/0x7f
   [8161faeb] ? _raw_spin_unlock+0x17/0x2a
   [8116c3eb] ? __alloc_fd+0xea/0xf9
   [8115499d] do_sys_open+0x70/0xff
   [81194e20] compat_SyS_open+0x1b/0x1d
   [8162842c] sysenter_dispatch+0x7/0x21

As per the other thread, I'm happy to test a patch against 3.15, but not hot 
about switching to a likely even less stable 3.16 since it's a real server with 
real data.



A few other people have complained about this, I've not been able to reproduce
it but I have a patch you can try.  It will make it so the box doesn't deadlock
anymore but I still need the output, look for timed out, thats when you need
to dump the logs and send it to me.  The patch is here


http://ur1.ca/hlj6d

Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Question] Btrfs on iSCSI device

2014-06-27 Thread Russell Coker
On Fri, 27 Jun 2014 18:34:34 Goffredo Baroncelli wrote:
 I don't think that it is possible to mount the _same device_ at the _same
 time_ on two different machines. And this doesn't depend by the filesystem.

If you use a clustered filesystem then you can safely mount it on multiple 
machines.

If you use a non-clustered filesystem it can still mount and even appear to 
work for a while.  It's surprising how many writes you can make to a dual-
mounted filesystem that's not designed for such things before you get a 
totally broken filesystem.

On Fri, 27 Jun 2014 13:15:16 Austin S Hemmelgarn wrote:
 The reason it appears to work when using iSCSI and not with directly
 connected parallel SCSI or SAS is that iSCSI doesn't provide low level
 hardware access.

I've tried this with dual-attached FC and had no problems mounting.  In what 
way is directly connected SCSI different from FC?

-- 
My Main Blog http://etbe.coker.com.au/
My Documents Bloghttp://doc.coker.com.au/

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Also seeing full deadlocks with 3.15.1

2014-06-27 Thread Marc MERLIN
On Fri, Jun 27, 2014 at 03:36:08PM -0700, Josef Bacik wrote:
 On 06/27/2014 11:50 AM, Marc MERLIN wrote:
 My laptop deadlocked some more times (everything works until it needs to
 touch the filesystem, and then it's deadlocked).
 Unfortunately, I can trigger sysrq, but it doesn't get committed to disk and
 netconsole eats half of it because it goes too fast for UDP apparently
 
 Now, I just captured that on my server with serial console.
 
 11005  1-16:11:10 wait_current_trans.isra.15 /usr/bin/zma -m 3
 14441  1-16:07:44 wait_current_trans.isra.15 /usr/bin/zma -m 1
 17045  1-23:53:33 wait_current_trans.isra.15 /usr/bin/zma -m 9
 22261  2-00:40:36 wait_current_trans.isra.15 /usr/bin/zma -m 6
 22292  2-00:40:36 wait_current_trans.isra.15 /usr/bin/zma -m 8
 
 1991109:29:35 wait_current_trans.isra.15 rm -f -- 
 /mnt/dshelf2/backup/0Notmachines/mysql//mysql.daily.sql.gz.13 
 /mnt/dshelf2/backup/0Notmachines/mysql//mysql.daily.sql.gz.13.gz
 22848  1-05:18:35 wait_current_trans.isra.15 rm -f -- 
 mnt/dshelf2/backup/0Notmachines/jen//backup.tar.bz.11 
 mnt/dshelf2/backup/0Notmachines/jen//backup.tar.bz.11.gz
 
 Those are 2 different filesystems (one single device mapper disk, the other 
 one is btrfs raid1), so I'm not sure which one of the 2 caused the problem, 
 but I'm perplexed as to why one would than hang the other, unless they both 
 hit the same bug?
 
 The sysrq-w output is here:
 https://urldefense.proofpoint.com/v1/url?u=http://marc.merlins.org/tmp/btrfs-hang.txtk=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0Am=CZ0ka0XcM6ZpRAF31LYBziutfoecu9ODO78jo5Kb2JQ%3D%0As=6213c6dc2c99166a71f262a1804bc7135ca17bffd8b9de175f655ed2a6a54f10
 
 but here is one hung process:
   zmaD 0003 0 22292  1 0x20020084
880074733bb0 0082 8800c933f270 880074733fd8
8801853b4610 000141c0 8801aac60f00 880036caa9e8
 880036caa800 8801db59f0c0 880074733bc0
   Call Trace:
[8161d3c6] schedule+0x73/0x75
[8122a87b] wait_current_trans.isra.15+0x98/0xf4
[810847ed] ? finish_wait+0x65/0x65
[8122bd95] start_transaction+0x498/0x4fc
[8122be14] btrfs_start_transaction+0x1b/0x1d
[8123602a] btrfs_create+0x3c/0x1ce
[81298985] ? security_inode_permission+0x1c/0x23
[8115e93e] ? __inode_permission+0x79/0xa4
[8115fbfc] vfs_create+0x66/0x8c
[8116095e] do_last+0x5af/0xa23
[81161009] path_openat+0x237/0x4de
[81162408] do_filp_open+0x3a/0x7f
[8161faeb] ? _raw_spin_unlock+0x17/0x2a
[8116c3eb] ? __alloc_fd+0xea/0xf9
[8115499d] do_sys_open+0x70/0xff
[81194e20] compat_SyS_open+0x1b/0x1d
[8162842c] sysenter_dispatch+0x7/0x21
 
 As per the other thread, I'm happy to test a patch against 3.15, but not hot 
 about switching to a likely even less stable 3.16 since it's a real server 
 with real data.
 
 
 A few other people have complained about this, I've not been able to reproduce
 it but I have a patch you can try.  It will make it so the box doesn't 
 deadlock
 anymore but I still need the output, look for timed out, thats when you need
 to dump the logs and send it to me.  The patch is here

Mmmh, I applied the patch, but now I'm getting tens of thousands of the lines 
below.
The machine is so unresponsive (due to serial port speed limitation and
amount of console spamming) that I cannot even ssh into it.
Example output below. I have to back that kernel out, it's unusable and
I'm not sure what output I can get you out of it.

[ 1313.747004] looking up page 46 on inode 8801ac3e9d68
[ 1313.747006] created a page, should be locked ? eac6d480
[ 1313.747006] looking up page 47 on inode 8801ac3e9d68
[ 1313.747008] created a page, should be locked ? eac6d4b8
[ 1313.747009] looking up page 48 on inode 8801ac3e9d68
[ 1313.747011] created a page, should be locked ? eac75ad0
[ 1313.747012] looking up page 49 on inode 8801ac3e9d68
[ 1313.747013] created a page, should be locked ? eac75b08
[ 1313.747014] looking up page 50 on inode 8801ac3e9d68
[ 1313.747016] created a page, should be locked ? eac5d420
[ 1313.747017] looking up page 51 on inode 8801ac3e9d68
[ 1313.747018] created a page, should be locked ? eac5d458
[ 1313.747019] looking up page 52 on inode 8801ac3e9d68
[ 1313.747021] created a page, should be locked ? eace4f00
[ 1313.747022] looking up page 53 on inode 8801ac3e9d68
[ 1313.747023] created a page, should be locked ? eace4f38
[ 1313.747024] looking up page 54 on inode 8801ac3e9d68
[ 1313.747026] created a page, should be locked ? eac989f0
[ 1313.747027] looking up page 55 on inode 8801ac3e9d68
[ 1313.747029] created a page, should be locked ? eac98a28
[ 1375.660075] dropping 

[PATCH] Btrfs: make sure to use btrfs_header_owner when freeing tree block

2014-06-27 Thread Josef Bacik
Mark noticed that his qgroup accounting for snapshot deletion wasn't working
properly on a particular file system.  Turns out we pass the root-objectid of
the root we are deleting to btrfs_free_extent, and use that root always when we
call btrfs_free_tree_block.  This isn't correct, the owner must match the
btrfs_header_owner() of the eb.  So to fix this we need to use that when we call
btrfs_free_extent, and we also need to use btrfs_header_owner(eb) in
btrfs_free_tree_block as the root we pass in may not be the owner in the case of
snapshot delete (though it is for all the normal cases which is why it wasn't
noticed before.)  With this patch on top of Mark's snapshot delete patch
everything is working a-ok.  Thanks,

Signed-off-by: Josef Bacik jba...@fb.com
---
 fs/btrfs/extent-tree.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 7671b15..7f9bb7c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -6189,7 +6189,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle 
*trans,
if (root-root_key.objectid != BTRFS_TREE_LOG_OBJECTID) {
ret = btrfs_add_delayed_tree_ref(root-fs_info, trans,
buf-start, buf-len,
-   parent, root-root_key.objectid,
+   parent, btrfs_header_owner(eb),
btrfs_header_level(buf),
BTRFS_DROP_DELAYED_REF, NULL, 0);
BUG_ON(ret); /* -ENOMEM */
@@ -7925,7 +7925,8 @@ skip:
}
 
ret = btrfs_free_extent(trans, root, bytenr, blocksize, parent,
-   root-root_key.objectid, level - 1, 0, 0);
+   btrfs_header_owner(next), level - 1, 0,
+   0);
BUG_ON(ret); /* -ENOMEM */
}
btrfs_tree_unlock(next);
-- 
2.0.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Also seeing full deadlocks with 3.15.1

2014-06-27 Thread Josef Bacik

On 06/27/2014 04:59 PM, Marc MERLIN wrote:

On Fri, Jun 27, 2014 at 03:36:08PM -0700, Josef Bacik wrote:

On 06/27/2014 11:50 AM, Marc MERLIN wrote:

My laptop deadlocked some more times (everything works until it needs to
touch the filesystem, and then it's deadlocked).
Unfortunately, I can trigger sysrq, but it doesn't get committed to disk and
netconsole eats half of it because it goes too fast for UDP apparently

Now, I just captured that on my server with serial console.

11005  1-16:11:10 wait_current_trans.isra.15 /usr/bin/zma -m 3
14441  1-16:07:44 wait_current_trans.isra.15 /usr/bin/zma -m 1
17045  1-23:53:33 wait_current_trans.isra.15 /usr/bin/zma -m 9
22261  2-00:40:36 wait_current_trans.isra.15 /usr/bin/zma -m 6
22292  2-00:40:36 wait_current_trans.isra.15 /usr/bin/zma -m 8

1991109:29:35 wait_current_trans.isra.15 rm -f -- 
/mnt/dshelf2/backup/0Notmachines/mysql//mysql.daily.sql.gz.13 
/mnt/dshelf2/backup/0Notmachines/mysql//mysql.daily.sql.gz.13.gz
22848  1-05:18:35 wait_current_trans.isra.15 rm -f -- 
mnt/dshelf2/backup/0Notmachines/jen//backup.tar.bz.11 
mnt/dshelf2/backup/0Notmachines/jen//backup.tar.bz.11.gz

Those are 2 different filesystems (one single device mapper disk, the other one 
is btrfs raid1), so I'm not sure which one of the 2 caused the problem, but I'm 
perplexed as to why one would than hang the other, unless they both hit the 
same bug?

The sysrq-w output is here:
https://urldefense.proofpoint.com/v1/url?u=http://marc.merlins.org/tmp/btrfs-hang.txtk=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0Am=CZ0ka0XcM6ZpRAF31LYBziutfoecu9ODO78jo5Kb2JQ%3D%0As=6213c6dc2c99166a71f262a1804bc7135ca17bffd8b9de175f655ed2a6a54f10

but here is one hung process:
  zma   D 0003 0 22292  1 0x20020084
   880074733bb0 0082 8800c933f270 880074733fd8
   8801853b4610 000141c0 8801aac60f00 880036caa9e8
    880036caa800 8801db59f0c0 880074733bc0
  Call Trace:
   [8161d3c6] schedule+0x73/0x75
   [8122a87b] wait_current_trans.isra.15+0x98/0xf4
   [810847ed] ? finish_wait+0x65/0x65
   [8122bd95] start_transaction+0x498/0x4fc
   [8122be14] btrfs_start_transaction+0x1b/0x1d
   [8123602a] btrfs_create+0x3c/0x1ce
   [81298985] ? security_inode_permission+0x1c/0x23
   [8115e93e] ? __inode_permission+0x79/0xa4
   [8115fbfc] vfs_create+0x66/0x8c
   [8116095e] do_last+0x5af/0xa23
   [81161009] path_openat+0x237/0x4de
   [81162408] do_filp_open+0x3a/0x7f
   [8161faeb] ? _raw_spin_unlock+0x17/0x2a
   [8116c3eb] ? __alloc_fd+0xea/0xf9
   [8115499d] do_sys_open+0x70/0xff
   [81194e20] compat_SyS_open+0x1b/0x1d
   [8162842c] sysenter_dispatch+0x7/0x21

As per the other thread, I'm happy to test a patch against 3.15, but not hot 
about switching to a likely even less stable 3.16 since it's a real server with 
real data.



A few other people have complained about this, I've not been able to reproduce
it but I have a patch you can try.  It will make it so the box doesn't deadlock
anymore but I still need the output, look for timed out, thats when you need
to dump the logs and send it to me.  The patch is here


Mmmh, I applied the patch, but now I'm getting tens of thousands of the lines 
below.
The machine is so unresponsive (due to serial port speed limitation and
amount of console spamming) that I cannot even ssh into it.
Example output below. I have to back that kernel out, it's unusable and
I'm not sure what output I can get you out of it.


Oh yeah I should have mentioned that, it's going to spit out a metric shittone
of stuff.  No worries, you had a lot more info in your sysrq+w, I'm hoping I can
get this to reproduce next week.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: make sure to use btrfs_header_owner when freeing tree block V2

2014-06-27 Thread Josef Bacik

On 06/27/2014 05:05 PM, Josef Bacik wrote:

Mark noticed that his qgroup accounting for snapshot deletion wasn't working
properly on a particular file system.  Turns out we pass the root-objectid of
the root we are deleting to btrfs_free_extent, and use that root always when we
call btrfs_free_tree_block.  This isn't correct, the owner must match the
btrfs_header_owner() of the eb.  So to fix this we need to use that when we call
btrfs_free_extent, and we also need to use btrfs_header_owner(eb) in
btrfs_free_tree_block as the root we pass in may not be the owner in the case of
snapshot delete (though it is for all the normal cases which is why it wasn't
noticed before.)  With this patch on top of Mark's snapshot delete patch
everything is working a-ok.  Thanks,

Signed-off-by: Josef Bacik jba...@fb.com
---
V1-V2: this one actually compiles.



Huh I may be completely full of crap here, let's just ignore all post 5pm 
Friday patches from me for now.  Thanks,

Josef

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Blocked tasks on 3.15.1

2014-06-27 Thread Chris Samuel
On Fri, 27 Jun 2014 05:20:41 PM Duncan wrote:

 If I'm not mistaken the fix for the 3.16 series bug was:
 
 ea4ebde02e08558b020c4b61bb9a4c0fcf63028e
 
 Btrfs: fix deadlocks with trylock on tree nodes.

That patch applies cleanly to 3.15.2 so if it is indeed the fix it should 
probably go to -stable for the next 3.15 release..

Unfortunately my test system died a while ago (hardware problem) and I've not 
been able to resurrect it yet.

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



signature.asc
Description: This is a digitally signed message part.


RAID1 3+ drives

2014-06-27 Thread Zack Coffey
Can I get more protection by using more than 2 drives?

I had an onboard RAID a few years back that would let me use RAID1
across up to 4 drives.

Apologies if this has been covered already, I don't recall seeing
anything saying yay or nay.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID1 3+ drives

2014-06-27 Thread Russell Coker
On Fri, 27 Jun 2014 20:30:32 Zack Coffey wrote:
 Can I get more protection by using more than 2 drives?
 
 I had an onboard RAID a few years back that would let me use RAID1
 across up to 4 drives.

Currently the only RAID level that fully works in BTRFS is RAID-1 with data on 
2 disks.  If you have 4 disks in the array then each block will be on 2 of the 
disks.  RAID-5/6 code mostly works but the last report I read indicated that 
some situations for recovery and disk replacement didn't work - presumably 
anyone who's afraid of multiple disks failing isn't going to want to trust 
BTRFS RAID-6 code at the moment.

If you want to have 4 disks in a fully redundant configuration (IE you could 
lose 3 disks without losing any data) then the thing to do is to have 2 RAID-1 
arrays with Linux software RAID and then run BTRFS RAID-1 on top of that.

-- 
My Main Blog http://etbe.coker.com.au/
My Documents Bloghttp://doc.coker.com.au/

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Question] Btrfs on iSCSI device

2014-06-27 Thread Austin S Hemmelgarn
On 06/27/2014 07:40 PM, Russell Coker wrote:
 On Fri, 27 Jun 2014 18:34:34 Goffredo Baroncelli wrote:
 I don't think that it is possible to mount the _same device_ at the _same
 time_ on two different machines. And this doesn't depend by the filesystem.
 
 If you use a clustered filesystem then you can safely mount it on multiple 
 machines.
 
 If you use a non-clustered filesystem it can still mount and even appear to 
 work for a while.  It's surprising how many writes you can make to a dual-
 mounted filesystem that's not designed for such things before you get a 
 totally broken filesystem.
 
 On Fri, 27 Jun 2014 13:15:16 Austin S Hemmelgarn wrote:
 The reason it appears to work when using iSCSI and not with directly
 connected parallel SCSI or SAS is that iSCSI doesn't provide low level
 hardware access.
 
 I've tried this with dual-attached FC and had no problems mounting.  In what 
 way is directly connected SCSI different from FC?
 
FC is actually it's own networking stack (and you can even run (in
theory) other protocols like IP and ATM on top of it), whereas parallel
SCSI is just a multi-drop bus, and SAS is just a tree-structured bus
with point-to-point communications emulated on top of it.  In other
words, parallel SCSI has topological constraints like RS-485, SAS has
topology constraints like USB, and FC has topology constraints like
Ethernet.

Secondarily, most filesystems on Linux will let you mount them multiple
times on separate hosts (ext4 has features to prevent this, but they are
expensive and therefore turned off by default, I think XFS might have
similar features, but I'm not sure).  BTRFS should in theory be more
resilient than most because of the COW nature (as long as it's only a
few commit cycles, you should still be able to recover most of the data
just fine).



smime.p7s
Description: S/MIME Cryptographic Signature


Re: RAID1 3+ drives

2014-06-27 Thread Duncan
Russell Coker posted on Sat, 28 Jun 2014 10:51:00 +1000 as excerpted:

 On Fri, 27 Jun 2014 20:30:32 Zack Coffey wrote:
 Can I get more protection by using more than 2 drives?
 
 I had an onboard RAID a few years back that would let me use RAID1
 across up to 4 drives.
 
 Currently the only RAID level that fully works in BTRFS is RAID-1 with
 data on 2 disks.

Not /quite/ correct.  Raid0 works, but of course that isn't exactly 
RAID as it's not redundant.  And raid10 works.  But that's simply 
raid0 over raid1.  So depending on whether you consider raid0 actually 
RAID or not, which in turn depends on how strict you are with the 
redundant part, there is or is not more than btrfs raid1 working.

 If you have 4 disks in the array then each block will
 be on 2 of the disks.

Correct.

FWIW I'm told that the paper that laid out the original definition of 
RAID (which was linked on this list in a similar discussion some months 
ago) defined RAID-1 as paired redundancy, no matter the number of 
devices.  Various implementations (including Linux' own mdraid soft-raid, 
and I believe dmraid as well) feature multi-way-mirroring aka N-way-
mirroring such that N devices equals N way mirroring, but that's an 
implementation extension and isn't actually necessary to claim RAID-1 
support.

So look for N-way-mirroring when you go RAID shopping, and no, btrfs does 
not have it at this time, altho it is roadmapped for implementation after 
completion of the raid5/6 code.

FWIW, N-way-mirroring is my #1 btrfs wish-list item too, not just for 
device redundancy, but to take full advantage of btrfs data integrity 
features, allowing to scrub a checksum-mismatch copy with the content 
of a checksum-validated copy if available.  That's currently possible, 
but due to the pair-mirroring-only restriction, there's only one 
additional copy, and if it happens to be bad as well, there's no 
possibility of a third copy to scrub from.  As it happens my personal 
sweet-spot between cost/performance and reliability would be 3-way 
mirroring, but once they code beyond N=2, N should go unlimited, so N=3, 
N=4, N=50 if you have a way to hook them all up... should all be possible.

But...

 RAID-5/6 code mostly works but the last report I
 read indicated that some situations for recovery and disk replacement
 didn't work - presumably anyone who's afraid of multiple disks failing
 isn't going to want to trust BTRFS RAID-6 code at the moment.

The raid5/6 code was on the list to be introduced in the next kernel or 
two something like two years ago, when I originally looked into it, and 
likely before that.  Like many of the btrfs features, it actually took 
rather longer to cook than was in the original plan -- it's actually 
rather more complicated than anticipated, and additionally it has been 
put off a few times to work on bugfixing currently supported feature 
bugs.  An incomplete raid56 implementation, normal runtime but not scrub 
or recovery, was introduced several kernels ago now, but it's still not 
complete.

So N-way-mirroring, which is supposed to build on several bits of the 
raid5/6 implementation and therefore is roadmapped for after it, 
continues to look about the same 3-5 kernels off, after raid5/6, as it 
did two years ago.  Except, having seen the raid5/6 timing, and having 
looked back at btrfs feature history going back rather longer, even if 
raid5/6 was declared finished for kernel 3.17 (since 3.16 is past the 
commit window), I'd guess it'd probably take another five kernels (a 
year's worth) or so, at /least/, for N-way-mirroring to properly cook.

So in actuality I'd be surprised to see any N-way-mirroring code at all 
before next spring, and would /not/ be surprised at all to see it take 
all of next year to fully cook to completion.

Not that I'm complaining /too/ much.  We work with what we have and btrfs 
as it is is quite beyond the features of most filesystems (just the data 
integrity and multi-device filesystem stuff at all, is great to work 
with, besides the stuff like subvolumes and snapshotting that doesn't fit 
my use-case that well =:^), even if it /is/ all presently limited to two-
way-mirroring! =:^\ ).  But it will sure be nice when I /can/ count on 
that third copy to scrub two bad copies, if two copies /do/ happen to be 
bad.

 If you want to have 4 disks in a fully redundant configuration (IE you
 could lose 3 disks without losing any data) then the thing to do is to
 have 2 RAID-1 arrays with Linux software RAID and then run BTRFS RAID-1
 on top of that.

The caveat with that is that at least mdraid1/dmraid1 has no verified 
data integrity, and while mdraid5/6 does have 1/2-way-parity calculation, 
it's only used in recovery, NOT cross-verified in ordinary use.

So it's not a proper substitute, tho I guess some big-money hardware 
raids might do it.

In fact, with md/dmraid and its reasonable possibility of silent 
corruption since at that level any of the copies could be returned and