Re: [systemd-devel] [HEADS-UP] Discoverable Partitions Spec

2014-03-10 Thread Kay Sievers
On Mon, Mar 10, 2014 at 7:34 PM, Goffredo Baroncelli kreij...@libero.it wrote:
 On 03/07/2014 07:26 PM, Lennart Poettering wrote:

 Since yesterday systemd in git can now discover root, /home, /srv and
 swap partitions automatically based on GPT type GUIDs, thus making
 /etc/fstab unnecessary for simple setups.

 I have now put together something like a spec describing the logic
 behind that, and what it is good for:

 http://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/


 Form the FAQ:
 CITE
 [...] What about automatic mounting of btrfs subvolumes to /var, /home and so 
 on?

 Doing a similar automatic discovery of btrfs subvolumes and mounting them 
 automatically to the appropriate places is certainly desirable. We are 
 waiting for the btrfs designers to add a per-subvolume type UUID to their 
 disk format to make this possible. [...]
 /CITE

 Instead of relying on the subvolume UUID, why not relying to the subvolume 
 name: it would be more simple and flexible to manage them.

As a general rule: human-readable names should be left to the
administrator, provide an identifier for humans, and should not be
overloaded with magic machine behavior.

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'umount' of multi-device volume hangs until the device is physically un-plugged

2012-09-17 Thread Kay Sievers
On Mon, Sep 17, 2012 at 7:19 PM, Josef Bacik jba...@fusionio.com wrote:
 On Sun, Sep 16, 2012 at 10:07:39PM -0600, Kay Sievers wrote:
 I'm currently playing around with native btrfs multi-device support in
 systemd. There might be a few hotplug issues to solve, here is the
 first one:

 A mounted (otherwise unused) multi-device volume (USB multi-slot card
 reader), hangs at:
   $ umount /mnt
 with (fedora) kernel
   3.6.0-0.rc5.git0.1.fc18.x86_64

 Any idea what to look for or what to try?

 Can I see the whole sysrq+w?  Also can you try btrfs-next and see if you have
 the same problems?  Thanks,

Hmm, I can't reproduce that today. Nothing really has changes with the
setup. It was easy to reproduce yesterday, even across multiple
reboots.

I'll come back if I see it again. Thanks,
Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BTRFS_IOC_DEVICES_READY and removed devices

2012-09-17 Thread Kay Sievers
We are currently playing around with native btrfs multi-device support
in systemd. We already committed the needed pieces to systemd git, to
register all detected btrfs filesystems with the kernel.

For volumes which are listed in fstab for mounting, we delay the
actual mount-attempt of a multi-device volume until we see READY
returned from BTRFS_IOC_DEVICES_READY. A line with UUID= in /etc/fstab
with nofail in the options field, and we can boot up without any
device plugged in. Now plugging in devices one-after-the-other until
the volume has a full tree of devices; with the last device there,
systemd just mounts the volume as expected.

This seems to work very well so far, unless a device which is already
registered disappears, which is a kind of valid hotplug scenario we
should handle better:

If one device of a 2-device volume is registered with the in-kernel
cache, and then the device is unplugged from the system, the cache
state does not get updated. If then the other device of the 2-device
volume is registered, BTRFS_IOC_DEVICES_READY indicates ready; but in
fact only one of two needed devices are available at that time, and
mounting fails.

Can we somehow subscribe to device media-changes/removal to prevent
the stale device state in the in-kernel cache?

Or alternatively make BTRFS_IOC_DEVICES_READY re-validate all involved
block devices before it returns READY?

Thanks,
Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] Rework KERN_LEVEL

2012-06-05 Thread Kay Sievers
On Tue, Jun 5, 2012 at 11:28 PM, Andrew Morton
a...@linux-foundation.org wrote:
 On Tue,  5 Jun 2012 02:46:29 -0700
 Joe Perches j...@perches.com wrote:

 KERN_LEVEL currently takes up 3 bytes.
 Shrink the kernel size by using an ASCII SOH and then the level byte.
 Remove the need for KERN_CONT.
 Convert directly embedded uses of . to KERN_LEVEL

 What an epic patchset.  I guess that saving a byte per printk does make
 the world a better place, and forcibly ensuring that nothing is
 dependent upon the internal format of the KERN_foo strings is nice.


 Unfortunately the n thing is part of the kernel ABI:

        echo 4foo  /dev/kmsg

 devkmsg_writev() does weird and wonderful things with
 facilities/levels.  That function incorrectly returns success when
 copy_from_user() faults, btw.  It also babbles on about LOG_USER and
 LOG_KERN without ever defining these things.  I guess they're
 userspace-only concepts and are hardwired to 0 and 1 in the kernel.  Or
 not.

It's as old as BSD, defined by syslog(3), used by glibc. The whole
%u prefix notation and the LOG_* names come from there.

The kernel is just user/facility == 0, so it never really was apparent
that the whole concept has more than a log level in that number.

Userspace syslog defines these pretty stupid numbers:
  /* facility codes */
  #define LOG_KERN(03)  /* kernel messages */
  #define LOG_USER(13)  /* random user-level messages */
  #define LOG_MAIL(23)  /* mail system */
  #define LOG_DAEMON  (33)  /* system daemons */
  #define LOG_AUTH(43)  /* security/authorization messages */
  #define LOG_SYSLOG  (53)  /* messages generated internally by syslogd */
  #define LOG_LPR (63)  /* line printer subsystem */
  #define LOG_NEWS(73)  /* network news subsystem */
  #define LOG_UUCP(83)  /* UUCP subsystem */
  #define LOG_CRON(93)  /* clock daemon */
  #define LOG_AUTHPRIV(103) /* security/authorization messages
(private) */
  #define LOG_FTP (113) /* ftp daemon */

but it *can* still all be pretty useful, and people *can* get creative
with facility numbers if they want to, as we have like 13 bits at the
moment to use for the facility which is stored in the kernel log
buffer. :)

/dev/kmsg just enforces LOG_USER, if userspace tries to inject stuff
with LOG_KERN, which it should not be allowed. The non-LOG_KERN number
itself has not much meaning it just says: this is not from the
kernel which is important to keep in the message.

Als, dmesg(1) has a -k option, that filters out all userspace-injected stuff.

 So what to do about /dev/kmsg?  I'd say nothing: we retain n as
 the externally-presented kernel format for a facility level, and the
 fact that the kernel internally uses a different encoding is hidden
 from userspace.

Yeah, I think so.

Yeah, we strip the   at printk() time, add the  back at output
time; they are not stored internally anymore, so that should not
affect the current behaviour.

 And if the user does

        echo \0014foo  /dev/kmsg

 then I guess we should pass it straight through, retaining the \0014.
 But from my reading of your code, this doesn't work - vprintk_emit()
 will go ahead and strip and interpret the \0014, evading the stuff
 which devkmsg_writev() did.

We should make it not accept faked prefixes, yes. It should be
impossible to let messages look like they originated from the kernel,
just like the current code enforces a non-LOG_KERN  prefix.

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] Rework KERN_LEVEL

2012-06-05 Thread Kay Sievers
On Tue, 2012-06-05 at 14:28 -0700, Andrew Morton wrote:

 devkmsg_writev() does weird and wonderful things with
 facilities/levels.  That function incorrectly returns success when
 copy_from_user() faults, btw.

Oh. Better?

Thanks,
Kay


From: Kay Sievers k...@vrfy.org
Subject: kmsg: /dev/kmsg - properly return possible copy_from_user() failure

Reported-By: Andrew Morton a...@linux-foundation.org
Signed-off-by: Kay Sievers k...@vrfy.org
---
 printk.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index 32462d2..6bdacab 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -365,8 +365,10 @@ static ssize_t devkmsg_writev(struct kiocb *iocb, const 
struct iovec *iv,
 
line = buf;
for (i = 0; i  count; i++) {
-   if (copy_from_user(line, iv[i].iov_base, iv[i].iov_len))
+   if (copy_from_user(line, iv[i].iov_base, iv[i].iov_len)) {
+   ret = -EFAULT;
goto out;
+   }
line += iv[i].iov_len;
}
 


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] Rework KERN_LEVEL

2012-06-05 Thread Kay Sievers
On Wed, Jun 6, 2012 at 1:35 AM, Joe Perches j...@perches.com wrote:
 On Tue, 2012-06-05 at 16:29 -0700, Andrew Morton wrote:
 What about writes starting with \001n?  AFACIT, that will be stripped
 away and the printk level will be altered.  This is new behavior.

 Nope.

 # echo \001Hello Andrew  /dev/kmsg
 /dev/kmsg has
 12,774,2462339252;\001Hello Andrew

Try echo -e? The stuff is copied verbatim otherwise.

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] Rework KERN_LEVEL

2012-06-05 Thread Kay Sievers
On Wed, Jun 6, 2012 at 1:43 AM, Joe Perches j...@perches.com wrote:
 On Wed, 2012-06-06 at 01:39 +0200, Kay Sievers wrote:

  # echo \001Hello Andrew  /dev/kmsg
  /dev/kmsg has
  12,774,2462339252;\001Hello Andrew

 Try echo -e? The stuff is copied verbatim otherwise.

 # echo -e \001Hello Kay  /dev/kmsg
 gives
 12,776,3046752764;\x01Hello Kay

Don't you need two bytes to trigger the logic?

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] Rework KERN_LEVEL

2012-06-05 Thread Kay Sievers
On Wed, Jun 6, 2012 at 2:07 AM, Joe Perches j...@perches.com wrote:
 On Tue, 2012-06-05 at 16:58 -0700, Andrew Morton wrote:
  echo \0014Hello Joe  /dev/kmsg

 # echo -e \x014Hello Me  /dev/kmsg
 gives:
 12,778,4057982669;Hello Me

 #echo -e \x011Hello Me_2  /dev/kmsg
 gives:
 12,779,4140452093;Hello Me_2

 I didn't change devkmsg_writev so the
 original parsing style for . is
 unchanged.

 from printk.c:

 static ssize_t devkmsg_writev(struct kiocb *iocb, const struct iovec *iv,
                              unsigned long count, loff_t pos)
 [...]
        int level = default_message_loglevel;
 [...]
        if (line[0] == '') {
                char *endp = NULL;

                i = simple_strtoul(line+1, endp, 10);
                if (endp  endp[0] == '') {
                        level = i  7;
                        if (i  3)
                                facility = i  3;
                        endp++;
                        len -= endp - line;
                        line = endp;
                }
        }
        line[len] = '\0';

        printk_emit(facility, level, NULL, 0, %s, line);
 []

 level is what matters.

 from dmesg -r

 12[ 2462.339252] \001Hello Andrew
 9[ 2516.023444] Hello Andrew
 12[ 3046.752764] \x01Hello Kay
 12[ 3940.871850] \x01Hello Kay
 12[ 4057.982669] Hello Me
 12[ 4140.452093] Hello Me_2

The question is what happens if you inject your new binary two-byte
prefix, like:
  echo -e \x01\x02Hello  /dev/kmsg

And if that changes the log-level to 2 instead of the default 4?

(assuming that I read your patch right, otherwise please correct the
bytes, but use the full sequence which your patch will recognize as an
internal level marker; seems your examples are all not triggering that)

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] Rework KERN_LEVEL

2012-06-05 Thread Kay Sievers
On Wed, Jun 6, 2012 at 2:19 AM, Joe Perches j...@perches.com wrote:
 On Wed, 2012-06-06 at 02:13 +0200, Kay Sievers wrote:
 The question is what happens if you inject your new binary two-byte
 prefix, like:
   echo -e \x01\x02Hello  /dev/kmsg

 It's not a 2 byte binary.
 It's a leading ascii SOH and a standard ascii char
 '0' ... '7' or 'd'.

 #define KERN_EMERG      KERN_SOH 0    /* system is unusable */
 #define KERN_ALERT      KERN_SOH 1    /* action must be taken immediately */
 etc...

Ok.

 And if that changes the log-level to 2 instead of the default 4?

 No it doesn't.

So:
   echo -e \x012Hello  /dev/kmsg
is still level 4? Sounds all fine then.

 It's not triggering that because devkmsg_writev does
 prefix parsing only on the old n form.

Yeah, but printk_emit() will not try to parse it? I did not check, but
with your change, the prefix parsing in printk_emit() is still skipped
if a level is given as a parameter to printk_emit(), right?

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] Rework KERN_LEVEL

2012-06-05 Thread Kay Sievers
On Wed, Jun 6, 2012 at 2:46 AM, Andrew Morton a...@linux-foundation.org wrote:
 On Tue, 05 Jun 2012 17:40:05 -0700 Joe Perches j...@perches.com wrote:

 On Tue, 2012-06-05 at 17:37 -0700, Andrew Morton wrote:
  On Tue, 05 Jun 2012 17:07:27 -0700 Joe Perches j...@perches.com wrote:
 
   On Tue, 2012-06-05 at 16:58 -0700, Andrew Morton wrote:
 echo \0014Hello Joe  /dev/kmsg
  
   # echo -e \x014Hello Me  /dev/kmsg
   gives:
   12,778,4057982669;Hello Me
 
  That's changed behavior.

 Which is an improvement too.

 No it isn't.  It exposes internal kernel implementation details in
 random weird inexplicable ways.  It doesn't seem at all important
 though.

 I very much doubt a single app will change
 because of this.

 I doubt it as well.

Yeah, the value of injecting such binary data is kind of questionable. :)

Joe, maybe you can change printk_emit() to skip the prefix
detection/stripping if a prefix is already passed to the function?

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: add a disk info ioctl to get the disks attached to a filesystem

2010-09-30 Thread Kay Sievers
On Thu, Sep 30, 2010 at 01:43, Christoph Hellwig h...@infradead.org wrote:
 On Wed, Sep 29, 2010 at 10:04:31AM +0200, Kay Sievers wrote:
 On Wed, Sep 29, 2010 at 09:25, Ric Wheeler rwhee...@redhat.com wrote:

  Second question is why is checking in /sys a big deal, would ??you prefer 
  an
  interface like we did for alignment in libblkid?

 It's about knowing what's behind the 'nodev' major == 0 of a btrfs
 mount. There is no way to get that from /sys or anywhere else at the
 moment.

 Usually filesystems backed by a disk have the dev_t of the device, or
 the fake block devices like md/dm/raid have their own major and the
 slaves/ directory pointing to the devices.

 This is not only about readahead, it's every other tool, that needs to
 know what kind of disks are behind a btrfs 'nodev' major == 0 mount.

 Thanks for explaining the problem.  It's one that affects everything
 with more than one underlying block device, so adding a
 filesystem-specific ioctl hack is not a good idea.  As mentioned in this
 mail we already have a solution for that - the block device slaves
 links used for raid and volume managers.  The most logical fix is to
 re-use that for btrfs as well and stop it from abusing the anonymous
 block major that was never intended for block based filesystems (and
 already has caused trouble in other areas).  One way to to this might
 be to allocate a block major for btrfs that only gets used for
 representing these links.

Yeah, we thought about that too, but a btrfs mount does not show up as
a block device, like md/dm, so there is no place for a slaves/
directory in /sys with the individual disks listed. How could be solve
that? Create some fake blockdev for every btrfs mount,  but that can't
be used to read/write raw blocks?

A generic solution, statfs()-like, which operates at the superblock
would be another option. Any idea if that could be made working?

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: add a disk info ioctl to get the disks attached to a filesystem

2010-09-30 Thread Kay Sievers
On Thu, Sep 30, 2010 at 21:48, Josef Bacik jo...@redhat.com wrote:
 On Wed, Sep 29, 2010 at 07:43:27PM -0400, Christoph Hellwig wrote:
 On Wed, Sep 29, 2010 at 10:04:31AM +0200, Kay Sievers wrote:
  On Wed, Sep 29, 2010 at 09:25, Ric Wheeler rwhee...@redhat.com wrote:
 
   Second question is why is checking in /sys a big deal, would ??you 
   prefer an
   interface like we did for alignment in libblkid?
 
  It's about knowing what's behind the 'nodev' major == 0 of a btrfs
  mount. There is no way to get that from /sys or anywhere else at the
  moment.
 
  Usually filesystems backed by a disk have the dev_t of the device, or
  the fake block devices like md/dm/raid have their own major and the
  slaves/ directory pointing to the devices.
 
  This is not only about readahead, it's every other tool, that needs to
  know what kind of disks are behind a btrfs 'nodev' major == 0 mount.

 Thanks for explaining the problem.  It's one that affects everything
 with more than one underlying block device, so adding a
 filesystem-specific ioctl hack is not a good idea.  As mentioned in this
 mail we already have a solution for that - the block device slaves
 links used for raid and volume managers.  The most logical fix is to
 re-use that for btrfs as well and stop it from abusing the anonymous
 block major that was never intended for block based filesystems (and
 already has caused trouble in other areas).  One way to to this might
 be to allocate a block major for btrfs that only gets used for
 representing these links.


 Ok I've spent a few hours on this and I'm hitting a wall.  In order to get the
 sort of /sys/block/btrfs-# sort of thing I have to do

 1) register_blkdev to get a major
 2) setup a gendisk
 3) do a bdget_disk
 4) Loop through all of our devices and do a bd_claim_by_disk on each of them

 This sucks because for step #2 I have to have a request_queue for the disk.
 It's a bogus disk, and theres no way to not have a request_queue, so I'd have 
 to
 wire that up and put a bunch of WARN_ON()'s to make sure nobody is trying to
 write to our special disk (since I assume that if I go through all this crap 
 I'm
 going to end up with a /dev/btrfs-# that people are going to try to write to).

 So my question is, is this what we want?  Do I just need to quit bitching and
 make it work?  Or am I doing something wrong?  This is a completely new area 
 for
 me so I'm just looking around at what md/dm does and trying to mirror it for 
 my
 own uses, if thats not what I should be doing please tell me, otherwise this
 seems like alot of work for a very shitty solution to our problem.  Thanks,

Yeah, that matches what I was experiencing when thinking about the
options. Making a btrfs mount a fake blockdev of zero size seems like
a pretty weird hack, just get some 'dead' directories in sysfs. A
btrfs mount is just not a raw blockdev, and should probably not
pretend to be one.

I guess a statfs()-like call from the filesystem side and not the
block side, which can put out such information in some generic way,
would better fit here.

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: add a disk info ioctl to get the disks attached to a filesystem

2010-09-29 Thread Kay Sievers
On Wed, Sep 29, 2010 at 09:25, Ric Wheeler rwhee...@redhat.com wrote:

 Second question is why is checking in /sys a big deal, would  you prefer an
 interface like we did for alignment in libblkid?

It's about knowing what's behind the 'nodev' major == 0 of a btrfs
mount. There is no way to get that from /sys or anywhere else at the
moment.

Usually filesystems backed by a disk have the dev_t of the device, or
the fake block devices like md/dm/raid have their own major and the
slaves/ directory pointing to the devices.

This is not only about readahead, it's every other tool, that needs to
know what kind of disks are behind a btrfs 'nodev' major == 0 mount.

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: add a disk info ioctl to get the disks attached to a filesystem

2010-09-29 Thread Kay Sievers
On Wed, Sep 29, 2010 at 01:25, Christoph Hellwig h...@infradead.org wrote:
 On Tue, Sep 28, 2010 at 04:53:16PM -0400, Josef Bacik wrote:
 This was a request from the systemd guys.  They need a quick and easy way to 
 get
 all devices attached to a Btrfs filesystem in order to check if any of the 
 disks
 are SSD for...something, I didn't ask :).   I've tested this with the
 btrfs-progs patch that accompanies this patch.  Thanks,

 So please tell the systemd guys to explain what the fuck they're doing
 to linux-fsdevel and fiend a proper interface.  Chance is they will fuck
 up as much as just about ever other lowlevel userspace tool are very
 high.

Fuck like these comments make it incredibly hard to find the few
statements where you are right, in all the fucking noise you are
creating.

Thanks,
Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] btrfs, udev and btrfs

2010-04-27 Thread Kay Sievers
On Fri, Apr 16, 2010 at 20:48, Goffredo Baroncelli kreij...@gmail.com wrote:
 Instead the first option has the disadvantage to need to be used for every new
 device.
 From this observation I write a udev rule which scan the new block devices,
 excluding floppy and cdrom.

 Below my udev rule

  $ cat /etc/udev/rules.d/60-btrfs.rules

  # ghigo 15/04/2010

  ACTION!=add|change, GOTO=btrfs_scan_end
  SUBSYSTEM!=block, GOTO=btrfs_scan_end
  KERNEL!=sd[!0-9]*|hd[!0-9]*, GOTO=btrfs_scan_end

  IMPORT{program}=/sbin/blkid -o udev -p $tempnode

Udev needs to do this already anyway. People are not encouraged to
call this in their own rule files again. Just make sure you place the
rule after the existing standard call that always comes with udev. The
btrfs rules can just depend on the variable set in the environment.
Also there are more devices than sd* which could have a btrfs volume,
but this is also covered by the standard udev call to blkid.

Thanks,
Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: /dev/btrfs-control

2010-01-08 Thread Kay Sievers
On Fri, Jan 8, 2010 at 17:37, Michael Niederle mniede...@gmx.at wrote:
 Some btrfs-tools make use of /dev/btrfs-control.

 How should I create this node? Is this a block or a character device (I 
 suppose
 it should be a character device), which major and minor numbers should be
 assigned?

It's a char device node:
  $ cat /sys/class/misc/btrfs-control/dev
  10:62

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs and /proc/self/mountinfo

2009-04-12 Thread Kay Sievers
On Sun, Apr 12, 2009 at 07:35, David Zeuthen zeut...@gmail.com wrote:

 But we'd still need some kind of way of having the kernel tell user
 space what devices are currently claimed by the btrfs filesystem
 instance (and we'd need notifications on changes too). Otherwise we
 don't have enough information for the desktop shell and partitioning
 programs to let the user know that /dev/sdb2 or /dev/sdc1 or whatever
 is currently claimed by the 0:19 btrfs mount at /media/fedora-usb.

 One idea is to have a pollable file, /proc/fs/btrfs/devs, that maps
 from the dev_t of the btrfs filesystem instance (as used in
 /proc/self/mountinfo) to the set of dev_t for block devices currently
 claimed? E.g. for the example above we'd have

  /proc/fs/btrfs/devs:
  0:19    8:18

Btrfs used to export some information in /sys/fs/btrfs/, this is
disabled for now. Maybe we can possibly make it export something like:
  $ tree /sys/fs/btrfs/
  /sys/fs/btrfs/
  |-- 969d1386-a002-4c28-94f2-47be23f344e4
  |   |-- ba1532f3-849b-400b-9c76-2c9aee126c52
  |   |   |-- device - ../../../devices/.../block/sda/sda3
  |   |   |-- attribute1
  |   |   |-- ...
  |   `-- 45645656-849b-400b-9c76-2c9aee126c52
  |   |-- device - ../../../devices/.../block/sdb/sdb3
  |   |-- attribute1
  |   
  `-- 645645686-a002-4c28-94f2-47be23f344e4
  |-- ...


So you could look for a device link at the subvolume devices? Or if
that does not fit for some reason, we could also add a btrfs class,
to export details about the subvolumes.

Thanks,
Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] adaptive spinning mutexes

2009-01-14 Thread Kay Sievers
On Wed, Jan 14, 2009 at 22:41, Ingo Molnar mi...@elte.hu wrote:

 * Ingo Molnar mi...@elte.hu wrote:

  You just disproved your own case :(

 how so? 80% is not enough? I also checked Fedora and it has
 SCHED_DEBUG=y in its kernel rpms.

 Ubuntu has CONFIG_SCHED_DEBUG=y as well in their kernels.

$ cat /etc/SuSE-release
openSUSE 11.1 (x86_64)
VERSION = 11.1

$ uname -a
Linux nga 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64
x86_64 x86_64 GNU/Linux

$ zgrep SCHED_DEBUG /proc/config.gz
CONFIG_SCHED_DEBUG=y

$ zgrep DEBUG_MUTEX /proc/config.gz
# CONFIG_DEBUG_MUTEXES is not set

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: weird bash autocomplete issue

2008-12-18 Thread Kay Sievers
On Fri, Dec 19, 2008 at 01:59,  devz...@web.de wrote:
 I see the same issue on x86 32 bit, with the additional __llseek()
 between the getdents64(), and the last entry returned by readdir
 ignored.

 confirmed - it`s readdir which assumes 32bit.

 attached is a sample program which shows the issue on my system.

 if compiled with -D_FILE_OFFSET_BITS=64, the problem goes away.

 old posting from around 2001:

http://sourceware.org/ml/libc-alpha/2001-01/msg00216.html

This is why everybody will have to compile programs with
_FILE_OFFSET_BITS=64.  Did you ever notice that all GNU programs
already do this?

 as 32bit systems can use 64bit filesystems, i think btrfs is correct and bash 
 is wrong,
 as it isn`t LFS aware. i think all 32bit stuff should be LFS aware, nowadays.

 to be exact, it`s not bash but readline library which comes with bash.
 bash configure script correctly checks for _FILE_OFFSET_BITS value, but 
 readline configure script doesn`t.
 this explains why i could not reproduce the issue when i build bash without 
 readline support.

 does it make sense to file a ticket at novell bugzilla ?

Sure, would be good to have that fixed. Cc: kasiev...@novell.com in
the bug, and I will move it directly to the right guy. :)

Thanks,
Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: weird bash autocomplete issue

2008-12-17 Thread Kay Sievers
On Wed, Dec 17, 2008 at 15:17, Chris Mason chris.ma...@oracle.com wrote:
 On Wed, 2008-12-17 at 14:59 +0100, Kay Sievers wrote:
 On Wed, Dec 17, 2008 at 09:45, Roland devz...@web.de wrote:
  On Tue, 2008-12-16 at 22:41 +0100, Kay Sievers wrote:
 
   open(., O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 3
   fstat64(3, {st_dev=makedev(0, 19), st_ino=256, st_mode=S_IFDIR|0555, 
   st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8,  
   st_size=18,
   st_atime=2008/12/16-21:32:38, st_mtime=2008/12/16-21:32:37, 
   st_ctime=2008/12/16-21:32:37}) = 0
   getdents64(3, {{d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, 
   d_name=.} {d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24,  
   d_name=..}
   {d_ino=257, d_off=3, d_type=DT_DIR, d_reclen=24,  d_name=test}
   {d_ino=258, d_off=9223372036854775807, d_type=DT_DIR,  d_reclen=32,
   d_name=linux}}, 4096) = 104
   _llseek(3, 3, [3], SEEK_SET)= 0
   getdents64(3, {{d_ino=258, d_off=9223372036854775807, d_type=DT_DIR, 
   d_reclen=32, d_name=linux}}, 4096) = 32
 
  On Tue, Dec 16, 2008 at 22:26,  devz...@web.de wrote:
   i assume it has something to do with the large value for d_off of the 
   last dirent ?
 
  Looks like, 9223372036854775807 is just LLONG_MAX.
 
  I can not reproduce that (on openSUSE 11.1). I also don't see
  the _llseek() calls.
 
  weird. no btrfs issue then !?
 
 
  open(., O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
  fstat(3, {st_dev=makedev(0, 18), ...
  getdents64(3, {
   {d_ino=260, d_off=2, d_type=DT_DIR, d_reclen=24, d_name=.}
   {d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, d_name=..}
   {d_ino=261, d_off=3, d_type=DT_REG, d_reclen=24, d_name=a}
   {d_ino=262, d_off=4, d_type=DT_REG, d_reclen=24, d_name=b}
   {d_ino=263, d_off=5, d_type=DT_REG, d_reclen=24, d_name=c}
   {d_ino=264, d_off=6, d_type=DT_DIR, d_reclen=24, d_name=test}
   {d_ino=265, d_off=9223372036854775807, d_type=DT_DIR, d_reclen=32,
  d_name=linux}
  }, 4096) = 176
  getdents64(3, {}, 4096) = 0
  close(3)
 
  This is with today's git kernel and today's standalone btrfs unstable.
 
  You are using the distro kernel and compile the standalone btrfs module?
 
  yes.
  to be honest, i`m slightly newer than 11.1 (did zypper dup to latest 
  factory
  some days ago)
 
  linux:~ # bash -version
  GNU bash, version 3.2.39(1)-release (i586-suse-linux-gnu)
  Copyright (C) 2007 Free Software Foundation, Inc.

 That is still the same bash, the one you use is a 32bit version. Do
 you run a 32 bit kernel too? I could try that on a 32 bit box then.

 At least on my 32 bit box, tab completion works fine.

It works fine here too on 64 bit. I'll try with openSUSE 11.1 on a
32bit box later tonight.

 But, the d_off of
 LLONG_MAX comes from btrfs_readdir().  Git had a feature where it would
 loop infinitely over a directory in some cases and this was my
 workaround.

There are other filesystems doing the same, usually with 32bit int max
instead of 64 bit int max, I guess that should work fine.

 This should be fixed in git by now, so I can drop it if that really is
 causing problems in bash.

I'll come back if I can reproduce it with the same environment Roland is using.

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: weird bash autocomplete issue

2008-12-17 Thread Kay Sievers
On Wed, Dec 17, 2008 at 09:45, Roland devz...@web.de wrote:
 On Tue, 2008-12-16 at 22:41 +0100, Kay Sievers wrote:

 On Tue, Dec 16, 2008 at 21:46,  devz...@web.de wrote:
  On Tue, Dec 16, 2008 at 20:37, Roland devz...@web.de wrote:
   i have come across a weird autocomplete issue i assume it is related
 to
   btrfs.
  
   let`s have some dirs:
  
   /non-btrfs-mount
 ./linux
 ./testdir
  
   /brtfs-mount
 ./linux
 ./testdir
  
   now, if i do cd ttab in /non-btrfs-mount, t autocompletes to
 testdir
   same for ltabinux - bash autocompletes as expected.
  
   now, the weird thing is, that on /btrfs-mount this behaves  
   different.
  
   autocompletion for testdir works, but not for linux dir. weird.
  
   can someone reproduce this ?
 
  Open another shell, find the bash process pid of the first shell with:
ps afx
  and do:
strace -p pid
  Go back to the first shell, hit tab, and the trace should show
  what's going on. You see a significant difference there?
 
 
  ok, here we go (i hope i did not cut important parts).
  i don`t see the real issue, but i did another interesting finding - see
   below
 
 
  bad (cd ltab):
 
  open(., O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 3
  fstat64(3, {st_dev=makedev(0, 19), st_ino=256, st_mode=S_IFDIR|0555, 
  st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8,  
  st_size=18,
  st_atime=2008/12/16-21:32:38, st_mtime=2008/12/16-21:32:37, 
  st_ctime=2008/12/16-21:32:37}) = 0
  getdents64(3, {{d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, 
  d_name=.} {d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24,  
  d_name=..}
  {d_ino=257, d_off=3, d_type=DT_DIR, d_reclen=24,  d_name=test}
  {d_ino=258, d_off=9223372036854775807, d_type=DT_DIR,  d_reclen=32,
  d_name=linux}}, 4096) = 104
  _llseek(3, 3, [3], SEEK_SET)= 0
  getdents64(3, {{d_ino=258, d_off=9223372036854775807, d_type=DT_DIR, 
  d_reclen=32, d_name=linux}}, 4096) = 32

 On Tue, Dec 16, 2008 at 22:26,  devz...@web.de wrote:
  i assume it has something to do with the large value for d_off of the 
  last dirent ?

 Looks like, 9223372036854775807 is just LLONG_MAX.

 I can not reproduce that (on openSUSE 11.1). I also don't see
 the _llseek() calls.

 weird. no btrfs issue then !?


 open(., O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
 fstat(3, {st_dev=makedev(0, 18), ...
 getdents64(3, {
  {d_ino=260, d_off=2, d_type=DT_DIR, d_reclen=24, d_name=.}
  {d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, d_name=..}
  {d_ino=261, d_off=3, d_type=DT_REG, d_reclen=24, d_name=a}
  {d_ino=262, d_off=4, d_type=DT_REG, d_reclen=24, d_name=b}
  {d_ino=263, d_off=5, d_type=DT_REG, d_reclen=24, d_name=c}
  {d_ino=264, d_off=6, d_type=DT_DIR, d_reclen=24, d_name=test}
  {d_ino=265, d_off=9223372036854775807, d_type=DT_DIR, d_reclen=32,
 d_name=linux}
 }, 4096) = 176
 getdents64(3, {}, 4096) = 0
 close(3)

 This is with today's git kernel and today's standalone btrfs unstable.

 You are using the distro kernel and compile the standalone btrfs module?

 yes.
 to be honest, i`m slightly newer than 11.1 (did zypper dup to latest factory
 some days ago)

 linux:~ # bash -version
 GNU bash, version 3.2.39(1)-release (i586-suse-linux-gnu)
 Copyright (C) 2007 Free Software Foundation, Inc.

That is still the same bash, the one you use is a 32bit version. Do
you run a 32 bit kernel too? I could try that on a 32 bit box then.

Thanks,
Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Kay Sievers
On Wed, Dec 17, 2008 at 14:23, Christoph Hellwig h...@infradead.org wrote:
 === Notes on support for multiple devices for a single filesystem ===

 == Intro ==

 Btrfs (and an experimental XFS version) can support multiple underlying block
 devices for a single filesystem instances in a generalized and flexible way.

 Unlike the support for external log devices in ext3, jfs, reiserfs, XFS, and
 the special real-time device in XFS all data and metadata may be spread over a
 potentially large number of block devices, and not just one (or two)


 == Requirements ==

 We want a scheme to support these complex filesystem topologies in way
 that is

  a) easy to setup and non-fragile for the users
  b) scalable to a large number of disks in the system
  c) recoverable without requiring user space running first
  d) generic enough to work for multiple filesystems or other consumers

 Requirement a) means that a multiple-device filesystem should be mountable
 by a simple fstab entry (UUID/LABEL or some other cookie) which continues
 to work when the filesystem topology changes.

 Requirement b) implies we must not do a scan over all available block devices
 in large systems, but use an event-based callout on detection of new block
 devices.

 Requirement c) means there must be some version to add devices to a filesystem
 by kernel command lines, even if this is not the default way, and might 
 require
 additional knowledge from the user / system administrator.

 Requirement d) means that we should not implement this mechanism inside a
 single filesystem.


 == Prior art ==

 * External log and realtime volume

 The most common way to specify the external log device and the XFS real time
 device is to have a mount option that contains the path to the block special
 device for it.  This variant means a mount option is always required, and
 requires the device name doesn't change, which is enough with udev-generated
 unique device names (/dev/disk/by-{label,uuid}).

 An alternative way, supported by optionally by ext3 and reiserfs and
 exclusively supported by jfs is to open the journal device by the device
 number (dev_t) of the block special device.  While this doesn't require
 an additional mount option when the device number is stored in the filesystem
 superblock it relies on the device number being stable which is getting
 increasingly unlikely in complex storage topologies.


 * RAID (MD) and LVM

 Software RAID and volume managers, although not strictly filesystems,
 have a similar very similar problem finding their devices.  The traditional
 solution used for early versions of the Linux MD driver and LVM version 1
 was to hook into the partitions scanning code and add device with the
 right partition type to a kernel-internal list of potential RAID / LVM
 devices.  This approach has the advantage of being simple to implement,
 fast, reliable and not requiring additional user space programs in the boot
 process.  The downside is that it only works with specific partition table
 formats that allow specifying a partition type, and doesn't work with
 unpartitioned disks at all.  Recent MD setups and LVM2 thus move the scanning
 to user space, typically using a command iterating over all block device
 nodes and performing the format-specific scanning.  While this is more 
 flexible
 than the in-kernel scanning, it scales very badly to a large number of
 block devices, and requires additional user space commands to run early
 in the boot process.  A variant of this schemes runs a scanning callout
 from udev once disk device are detected, which avoids the scanning overhead.


 == High-level design considerations ==

 Due to requirement b) we need a layer that finds devices for a single
 fstab entry.  We can either do this in user space, or in kernel space. As 
 we've
 traditionally always done UUID/LABEL to device mapping in userspace, and we
 already have libvolume_id and libblkid dealing with the specialized case
 of UUID/LABEL to single device mapping I would recommend to keep doing
 this in user space and try to reuse the libvolume_id / libblkid.

 There are to options to perform the assembly of the device list for
 a filesystem:

  1) whenever libvolume_id / libblkid find a device detected as a multi-device
capable filesystem it gets added to a list of all devices of this
particular filesystem type.
On mount type mount(8) or a mount.fstype helpers calls out to the
libraries to get a list of devices belonging to this filesystem
type and translates them to device names, which can be passed to
the kernel on the mount command line.

Advantage:  Requires a mount.fstype helper or fs-specific knowledge
in mount(8).
Disadvantages:  Required libvolume_id / libblkid to keep state.

  2) whenever libvolume_id / libblkid find a device detected as a multi-device
capable filesystem they call into the kernel through and ioctl / sysfs /
etc to add it to a list 

Re: Notes on support for multiple devices for a single filesystem

2008-12-17 Thread Kay Sievers
On Wed, Dec 17, 2008 at 21:58, Chris Mason chris.ma...@oracle.com wrote:
 On Wed, 2008-12-17 at 11:53 -0800, Andrew Morton wrote:

 One thing I've never seen comprehensively addressed is: why do this in
 the filesystem at all?  Why not let MD take care of all this and
 present a single block device to the fs layer?

 Lots of filesystems are violating this, and I'm sure the reasons for
 this are good, but this document seems like a suitable place in which to
 briefly decribe those reasons.

 I'd almost rather see this doc stick to the device topology interface in
 hopes of describing something that RAID and MD can use too.  But just to
 toss some information into the pool:

 * When moving data around (raid rebuild, restripe, pvmove etc), we want
 to make sure the data read off the disk is correct before writing it to
 the new location (checksum verification).

 * When moving data around, we don't want to move data that isn't
 actually used by the filesystem.  This could be solved via new APIs, but
 keeping it crash safe would be very tricky.

 * When checksum verification fails on read, the FS should be able to ask
 the raid implementation for another copy.  This could be solved via new
 APIs.

 * Different parts of the filesystem might want different underlying raid
 parameters.  The easiest example is metadata vs data, where a 4k
 stripesize for data might be a bad idea and a 64k stripesize for
 metadata would result in many more rwm cycles.

 * Sharing the filesystem transaction layer.  LVM and MD have to pretend
 they are a single consistent array of bytes all the time, for each and
 every write they return as complete to the FS.

 By pushing the multiple device support up into the filesystem, I can
 share the filesystem's transaction layer.  Work can be done in larger
 atomic units, and the filesystem will stay consistent because it is all
 coordinated.

 There are other bits and pieces like high speed front end caching
 devices that would be difficult in MD/LVM, but since I don't have that
 coded yet I suppose they don't really count...

Features like the very nice and useful directory-based snapshots would
also not be possible with simple block-based multi-devices, right?

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: weird bash autocomplete issue

2008-12-17 Thread Kay Sievers
On Wed, Dec 17, 2008 at 15:46, Kay Sievers kay.siev...@vrfy.org wrote:
 On Wed, Dec 17, 2008 at 15:17, Chris Mason chris.ma...@oracle.com wrote:
 On Wed, 2008-12-17 at 14:59 +0100, Kay Sievers wrote:
 On Wed, Dec 17, 2008 at 09:45, Roland devz...@web.de wrote:
  On Tue, 2008-12-16 at 22:41 +0100, Kay Sievers wrote:
 
   open(., O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 3
   fstat64(3, {st_dev=makedev(0, 19), st_ino=256, st_mode=S_IFDIR|0555, 
   st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8,  
   st_size=18,
   st_atime=2008/12/16-21:32:38, st_mtime=2008/12/16-21:32:37, 
   st_ctime=2008/12/16-21:32:37}) = 0
   getdents64(3, {{d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, 
   d_name=.} {d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24,  
   d_name=..}
   {d_ino=257, d_off=3, d_type=DT_DIR, d_reclen=24,  d_name=test}
   {d_ino=258, d_off=9223372036854775807, d_type=DT_DIR,  d_reclen=32,
   d_name=linux}}, 4096) = 104
   _llseek(3, 3, [3], SEEK_SET)= 0
   getdents64(3, {{d_ino=258, d_off=9223372036854775807, d_type=DT_DIR, 
   d_reclen=32, d_name=linux}}, 4096) = 32
 
  On Tue, Dec 16, 2008 at 22:26,  devz...@web.de wrote:
   i assume it has something to do with the large value for d_off of the 
   
   last dirent ?
 
  Looks like, 9223372036854775807 is just LLONG_MAX.
 
  I can not reproduce that (on openSUSE 11.1). I also don't see
  the _llseek() calls.
 
  weird. no btrfs issue then !?
 
 
  open(., O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
  fstat(3, {st_dev=makedev(0, 18), ...
  getdents64(3, {
   {d_ino=260, d_off=2, d_type=DT_DIR, d_reclen=24, d_name=.}
   {d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, d_name=..}
   {d_ino=261, d_off=3, d_type=DT_REG, d_reclen=24, d_name=a}
   {d_ino=262, d_off=4, d_type=DT_REG, d_reclen=24, d_name=b}
   {d_ino=263, d_off=5, d_type=DT_REG, d_reclen=24, d_name=c}
   {d_ino=264, d_off=6, d_type=DT_DIR, d_reclen=24, d_name=test}
   {d_ino=265, d_off=9223372036854775807, d_type=DT_DIR, d_reclen=32,
  d_name=linux}
  }, 4096) = 176
  getdents64(3, {}, 4096) = 0
  close(3)
 
  This is with today's git kernel and today's standalone btrfs unstable.
 
  You are using the distro kernel and compile the standalone btrfs module?
 
  yes.
  to be honest, i`m slightly newer than 11.1 (did zypper dup to latest 
  factory
  some days ago)
 
  linux:~ # bash -version
  GNU bash, version 3.2.39(1)-release (i586-suse-linux-gnu)
  Copyright (C) 2007 Free Software Foundation, Inc.

 That is still the same bash, the one you use is a 32bit version. Do
 you run a 32 bit kernel too? I could try that on a 32 bit box then.

 At least on my 32 bit box, tab completion works fine.

 It works fine here too on 64 bit. I'll try with openSUSE 11.1 on a
 32bit box later tonight.

 But, the d_off of
 LLONG_MAX comes from btrfs_readdir().  Git had a feature where it would
 loop infinitely over a directory in some cases and this was my
 workaround.

 There are other filesystems doing the same, usually with 32bit int max
 instead of 64 bit int max, I guess that should work fine.

 This should be fixed in git by now, so I can drop it if that really is
 causing problems in bash.

 I'll come back if I can reproduce it with the same environment Roland is 
 using.

I see the same issue on x86 32 bit, with the additional __llseek()
between the getdents64(), and the last entry returned by readdir
ignored.

If I change the returned LLONG_MAX to LONG_MAX in inode.c, it all
works fine, and the __llseek() disappears.

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: weird bash autocomplete issue

2008-12-16 Thread Kay Sievers
On Tue, Dec 16, 2008 at 21:46,  devz...@web.de wrote:
 On Tue, Dec 16, 2008 at 20:37, Roland devz...@web.de wrote:
  i have come across a weird autocomplete issue i assume it is related to
  btrfs.
 
  let`s have some dirs:
 
  /non-btrfs-mount
./linux
./testdir
 
  /brtfs-mount
./linux
./testdir
 
  now, if i do cd ttab in /non-btrfs-mount, t autocompletes to 
  testdir
  same for ltabinux - bash autocompletes as expected.
 
  now, the weird thing is, that on /btrfs-mount this behaves different.
 
  autocompletion for testdir works, but not for linux dir. weird.
 
  can someone reproduce this ?

 Open another shell, find the bash process pid of the first shell with:
   ps afx
 and do:
   strace -p pid
 Go back to the first shell, hit tab, and the trace should show
 what's going on. You see a significant difference there?


 ok, here we go (i hope i did not cut important parts).
 i don`t see the real issue, but i did another interesting finding - see below


 bad (cd ltab):

 open(., O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 3
 fstat64(3, {st_dev=makedev(0, 19), st_ino=256, st_mode=S_IFDIR|0555, 
 st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=18, 
 st_atime=2008/12/16-21:32:38, st_mtime=2008/12/16-21:32:37, 
 st_ctime=2008/12/16-21:32:37}) = 0
 getdents64(3, {{d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, d_name=.} 
 {d_ino=256, d_off=2, d_type=DT_DIR, d_reclen=24, d_name=..} {d_ino=257, 
 d_off=3, d_type=DT_DIR, d_reclen=24, d_name=test} {d_ino=258, 
 d_off=9223372036854775807, d_type=DT_DIR, d_reclen=32, d_name=linux}}, 
 4096) = 104
 _llseek(3, 3, [3], SEEK_SET)= 0
 getdents64(3, {{d_ino=258, d_off=9223372036854775807, d_type=DT_DIR, 
 d_reclen=32, d_name=linux}}, 4096) = 32

On Tue, Dec 16, 2008 at 22:26,  devz...@web.de wrote:
 i assume it has something to do with the large value for d_off of the last 
 dirent ?

Looks like, 9223372036854775807 is just LLONG_MAX.

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs trees for linux-next

2008-12-15 Thread Kay Sievers
On Mon, Dec 15, 2008 at 22:03, Andreas Dilger adil...@sun.com wrote:
 On Dec 11, 2008  09:43 -0500, Chris Mason wrote:
 The multi-device code uses a very simple brute force scan from userland
 to populate the list of devices that belong to a given FS.  Kay Sievers
 has some ideas on hotplug magic to make this less dumb.  (The scan isn't
 required for single device filesystems).

 This should use libblkid to do the scanning of the devices, and it can
 cache the results for efficiency.  Best would be to have the same LABEL+UUID
 for all devices in the same filesystem, and then once any of these devices
 are found the mount.btrfs code can query the rest of the devices to find
 the remaining parts of the filesystem.

Which is another way to do something you should not do that way in the
first place, just with a library instead of your own code.

Brute-force scanning /dev with a single thread will not work reliably
in many setups we need to support. Sure, it's good to have it for a
rescue system, it will work fine or your workstation, but definitely
not for boxes with many devices where you don't know how they behave.

Just do:
  $ modprobe scsi_debug max_luns=8 num_parts=2
  $ echo 1  /sys/module/scsi_debug/parameters/every_nth
  $ echo 4  /sys/module/scsi_debug/parameters/opts

  $ ls -l /sys/class/block/ | wc -l
  45
and then call any binary doing /dev scanning, and wait (in this case)
for ~2 hours to return.

Also, the blkid cache file uses major/minor numbers or kernel device
names, which will also not help in many setups we have to support
today.

The original btrfs topic, leading to this, is here:
  http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg01048.html

Thanks,
Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs and hotplug, auto-assembly, auto-setup, ...

2008-12-10 Thread Kay Sievers
On Thu, Dec 11, 2008 at 02:30, Chris Mason [EMAIL PROTECTED] wrote:
 On Tue, 2008-12-09 at 19:02 +0100, Kay Sievers wrote:

 At a first look, it looks very promising, and I really like the idea
 that the state of the (possibly incomplete) device tree is kept in the
 kernel, and not somewhere in a file in userspace, like we usually see
 for all sorts of multi-volume/multi-device setups. It should make
 things much easier as usual.


 I hope so, at least its the only way I can keep my brain wrapped around
 it.

Yeah, it makes a lot of sense.

 Like with every other subsystem, people will expect btrfs to just work
 with hotpluggable devices, without much configuration and explicit
 setup after device connect. To assemble a mountable volume, we will
 need to find the (possibly several independent) devices containing the
 btrfs data.

 I did somewhat have hotplug in mind, there is btrfsctl -a to scan all
 of /dev and btrfsctl -A to scan a single device.

That works fine here. We just need to offer some non-sequential
scanning for some setups, to be reliable. But that should be fine, if
we find a way to plug the information together.

 Now that I have something close to a stable super block location and
 magic, I think the plan below is pretty good.  The majority of my plan
 here was to make a simple ioctl that hotplug could trigger, and let
 someone who knew hotplug better make suggetions on the best way to
 present the information.

I have the btrfs detection code in udev since while, to be able to
test it, and I'm tracking the changes.

After the metadata is finalized, I will come up with a few working
examples how we could make this information easily available, and
possible integrate it into the tools, and we can decide what we think
is the best.

One thing I like to check now, if I got it correctly - the volume that
gets mounted has:
  btrfs_super_block.fsid (the volume, may be used for mount-by-label)
  btrfs_super_block.label (the volume, may be used for mount-by-label)

The devices the volume is assembled from, which can be several, have:
  btrfs_super_block.dev_item.uuid (the device uuid, not used in userspace)
  btrfs_super_block.dev_item.fsid (the volume uuid, matches
btrfs_super_block.fsid)

Is this correct?

Thanks,
Kay
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html