Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status

2015-06-13 Thread Goffredo Baroncelli
On 2015-06-13 11:35, Anand Jain wrote:
> 
> Thanks for your reply Andrei and Goffredo. more below...
> 
> On 06/13/2015 04:08 AM, Goffredo Baroncelli wrote:
>> On 2015-06-12 20:04, Andrei Borzenkov wrote:
>>> В Fri, 12 Jun 2015 21:16:30 +0800 Anand Jain
>>>  пишет:
>>> 
>>>> 
>>>> 
>>>> BTRFS_IOC_DEVICES_READY is to check if all the required
>>>> devices are known by the btrfs kernel, so that
>>>> admin/system-application could mount the FS. It is checked
>>>> against a device in the argument.
>>>> 
>>>> However the actual implementation is bit more than just that, 
>>>> in the way that it would also scan and register the device 
>>>> provided in the argument (same as btrfs device scan subcommand 
>>>> or BTRFS_IOC_SCAN_DEV ioctl).
>>>> 
>>>> So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl, 
>>>> but its a write command as well.
>>>> 
>>>> Next, since in the kernel we only check if total_devices (read
>>>> from SB)  is equal to num_devices (counted in the list) to
>>>> state the status as 0 (ready) or 1 (not ready). But this does
>>>> not work in rest of the device pool state like missing, 
>>>> seeding, replacing since total_devices is actually not equal to
>>>> num_devices in these state but device pool is ready for the
>>>> mount and its a bug which is not part of this discussions.
>>>> 
>>>> 
>>>> Questions:
>>>> 
>>>> - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and 
>>>> register the device provided (same as btrfs device scan command
>>>> or the BTRFS_IOC_SCAN_DEV ioctl) OR can BTRFS_IOC_DEVICES_READY
>>>> be read-only ioctl interface to check the state of the device
>>>> pool. ?
>>>> 
>>> 
>>> udev is using it to incrementally assemble multi-device btrfs, so
>>> in this case I think it should.
> 
> Nice. Thanks for letting me know this.
> 
>> I agree, the ioctl name is confusing, but unfortunately this is an
>> API and it has to be stay here forever. Udev uses it, so we know
>> for sure that it is widely used.
> 
> ok. what goes in stays there forever. its time to update the man page
> rather.
> 
>>> Are there any other users?
>>> 
>>>> - If the the device in the argument is already mounted, can it
>>>> straightaway return 0 (ready) ? (as of now it would again
>>>> independently read the SB determine total_devices and check
>>>> against num_devices.
>>>> 
>>> 
>>> I think yes; obvious use case is btrfs mounted in initrd and
>>> later coldplug. There is no point to wait for anything as
>>> filesystem is obviously there.
>>> 
> 
> There is little difference. If the device is already mounted. And
> there are two device paths for the same device PA and PB. The path as
> last given to either 'btrfs dev scan (BTRFS_IOC_SCAN_DEV)' or 'btrfs
> device ready (BTRFS_IOC_DEVICES_READY)' will be shown in the 'btrfs
> filesystem show' or '/proc/self/mounts' output. It does not mean that
> btrfs kernel will close the first device path and reopen the 2nd
> given device path, it just updates the device path in the kernel.
> 
> Further, the problem will be more intense in this eg. if you use dd
> and copy device A to device B. After you mount device A, by just
> providing device B in the above two commands you could let kernel
> update the device path, again all the IO (since device is mounted)
> are still going to the device A (not B), but /proc/self/mounts and
> 'btrfs fi show' shows it as device B (not A).
> 
> Its a bug. very tricky to fix.

In the past [*] I proposed a mount.btrfs helper . I tried to move the logic 
outside the kernel.
I think that the problem is that we try to manage all these cases from a device 
point of view: when a device appears, we register the device and we try to 
mount the filesystem... This works very well when there is 1-volume filesystem. 
For the other cases there is a mess between the different layers:
- kernel
- udev/systemd
- initrd logic

My attempt followed a different idea: the mount helper waits the devices if 
needed, or if it is the case it mounts the filesystem in degraded mode. All 
devices are passed as mount arguments (--device=/dev/sdX), there is no a device 
registration: this avoids all these problems.

[*] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/40767

back to your questions

> - we can't return -EBUSY 

Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status

2015-06-14 Thread Goffredo Baroncelli
On 2015-06-14 06:05, Duncan wrote:
> Goffredo Baroncelli posted on Sat, 13 Jun 2015 17:09:19 +0200 as
> excerpted:
> 
>> My attempt followed a different idea: the mount helper waits the devices
>> if needed, or if it is the case it mounts the filesystem in degraded
>> mode.
>> All devices are passed as mount arguments (--device=/dev/sdX), there is
>> no a device registration: this avoids all these problems.
>>
>> [*] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/40767
> 
> But /dev/sdX doesn't always work, because, for instance, my usual /dev/sdb 
> was slow to respond on my last boot, and currently appears as /dev/sdf, 
> with sdb/c/d/e being my (multi-type) sdcard, etc, adapter, medialess.

Please give a look to my patch.

You may mount the filesystem in different way:
- by device (/dev/sdxxx)
- by UUID (UUID=)
- by LABEL (LABEL=)

The helper finds the right devices and (eventually) waits for the other devices.
When it has collected all the devices, these are passed to the kernel via 
the "device=/dev/sdx" mount option. So the registration would not be needed 
anymore.

> 
> Tho if /dev/disk/by-*/* works, I could use that.  Tho AFAIK it's udev 
> that fills that in, so udev would be necessary.

I never wrote that udev is not necessary. I think only that relying to udev
to handling a multi-volume filesystem is too complicated. The responsibility 
is spread in too much layer.



-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status

2015-06-17 Thread Goffredo Baroncelli
On 2015-06-15 19:38, Lennart Poettering wrote:
> On Mon, 15.06.15 19:23, Goffredo Baroncelli (kreij...@inwind.it) wrote:
> 
>> On 2015-06-15 12:46, Lennart Poettering wrote:
>>> On Sat, 13.06.15 17:09, Goffredo Baroncelli (kreij...@libero.it) wrote:
>>>
>>>>> Further, the problem will be more intense in this eg. if you use dd
>>>>> and copy device A to device B. After you mount device A, by just
>>>>> providing device B in the above two commands you could let kernel
>>>>> update the device path, again all the IO (since device is mounted)
>>>>> are still going to the device A (not B), but /proc/self/mounts and
>>>>> 'btrfs fi show' shows it as device B (not A).
>>>>>
>>>>> Its a bug. very tricky to fix.
>>>>
>>>> In the past [*] I proposed a mount.btrfs helper . I tried to move the 
>>>> logic outside the kernel.
>>>> I think that the problem is that we try to manage all these cases
>>>> from a device point of view: when a device appears, we register the
>>>> device and we try to mount the filesystem... This works very well
>>>> when there is 1-volume filesystem. For the other cases there is a
>>>> mess between the different layers:
>>>
>>>> - kernel
>>>> - udev/systemd
>>>> - initrd logic
>>>>
>>>> My attempt followed a different idea: the mount helper waits the
>>>> devices if needed, or if it is the case it mounts the filesystem in
>>>> degraded mode. All devices are passed as mount arguments
>>>> (--device=/dev/sdX), there is no a device registration: this avoids
>>>> all these problems.
>>>
>>> Hmm, no. /bin/mount should not block for devices. That's generally
>>> incompatible with how the tool is used, and in particular from
>>> systemd. We would not make use for such a scheme in
>>> systemd. /bin/mount should always be short-running.
>>
>> Apart systemd, which are these incompatibilities ? 
> 
> Well, /bin/mount is not a daemon, and it should not be one.

My helper is not a deamon; you was correct the first time: it blocks until all 
needed/enough devices are appeared.
Anyway this should not be different from mounting a nfs filesystem. Even in 
this case the mount helper blocks until the connection happened. The block time 
is not negligible, even tough not long as a device timeout ... 

> 
>>> I am pretty sure that if such automatic degraded mounting should be
>>> supported, then this should be done with some background storage
>>> daemon that alters the effect of the READY ioctl somehow after the
>>> timeout, and then retriggers the devcies so that systemd takes
>>> note. (or, alternatively: such a scheme could even be implemented all
>>> in kernel, based on some configurable kernel setting...)
>>
>> I recognize that this solution provides the maximum compatibility
>> with the current implementation. However it seems too complex to
>> me. Re-trigging a devices seems to me more a workaround than a
>> solution.
> 
> Well, it's not really ugly. I mean, if the state or properties of a
> device change, then udev should update its information about it, and
> that's done via a retrigger. We do that all the time already, for
> example when an existing loopback device gets a backing file assigned
> or removed. I am pretty sure that loopback case is very close to what
> you want to do here, hence retriggering (either from the kernel side,
> or from userspace), appears like an OK thing to do.

What seems strange to me is that in this case the devices don't have changed 
their status.
How this problem is managed in the md/dm raid cases ?

> 
>> Could a generator do this job ? I.e. this generator (or storage
>> daemon) waits that all (or enough) devices are appeared, then it
>> creates a .mount unit: do you think that it is doable ?
> 
> systemd generators are a way to extend the systemd unit dep tree with
> units. They are very short running, and are executed only very very
> early at boot. They cannot wait for anything, they don#t have access
> to devices and are not run when they are appear.
> 
> Lennart
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] RFC: filter and search journalctl

2015-08-08 Thread Goffredo Baroncelli
On 2015-08-07 11:53, Sebastian Schindler wrote:
> Hi all.
> 
> The journal format offers powerful filter capabilities. Unfortunately this 
> power
> is lost, if you have to use grep to find certain information.
> Example given (unscientific benchmark), count the number of entries for a
> (known) executable:
> 
> 
> $> journalctl --disk-usage
> Archived and active journals take up 344.1M on disk.
> 
> $> $ time (journalctl _EXE=/usr/sbin/dhclient -o verbose | \
>  grep -F _EXE=/usr/sbin/dhclient | wc -l)
> 1233
> 
> real0m0.111s
> user0m0.007s
> sys 0m0.091s
> 
> 
> $> $ time (journalctl -o verbose | grep -F _EXE=/usr/sbin/dhclient | wc 
> -l)
> 1233
> 
> real0m7.515s
> user0m5.088s
> sys 0m6.896s
> 
> 
> This shows that using grep-piping is magnitudes slower than journalctl.

This is due to the fact that the journal file is structured like a database; 
all fields are fully indexed, so journalctl is faster in case of a query like 
KEY=VALUE.

For other kind of search (by regexp, or only by value), journalctl cannot use 
the indexes so it is a lot slower because it has to process all the journal log.

I am curious  if you do 

$ time ( grep -F _EXE=/usr/sbin/dhclient /var/log/journal/*/*| wc -l)

which is the time resulting

> 
> Grep-ing seems to be the only solution to find log entries if you don't fully
> know what you're looking for. For example: You want to see all entries
> containing a certain MESSAGE that gets enriched with additional information
> during the logging process:
> 
> MESSAGE=host  has closed connection 
> 
> At the moment you have no option to look for this kind of information unless
> someone has set something like  MESSAGE_ID you can filter for. There are 
> several
> use cases using this pattern of thinking:
> 
> * there's no option to show all set FIELD keys in the current journal, 
> although
>   this information is encoded in the header of each journal file
> * there's no support for negated filtering, you can't easily hide output of a
>   certain unit which is creating too much noise
> * there's no support for regular expressions (except for the --unit option),
>   this is especially problematic when you're looking for certain MESSAGEs
> * there's no option to show all entries containing a certain field
> * logical expressions are somewhat hard to read/write because parenthesis 
> can't
>   be used to enforce certain logical expressions
> 
> What do you think about a query language for journalctl that allows more
> powerful search options? This could be introduced without ignoring the
> capabilities the journal file format has to offer. Are there maybe already 
> plans
> to introduce something alike into journalctl? Do some people here have
> experience with query languages for such a use case? Things come to mind like
> PCAP filter, SPARQL, Lucene or the SPHINX Query Language.
> ___
> systemd-devel mailing list
> systemd-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/systemd-devel
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] bootctl: default mount point for the ESP partition.

2015-09-01 Thread Goffredo Baroncelli
Hi Kay,


I am playing with bootctl; I discovered that bootctl assume as default mount 
point for the ESP partition the /boot directory. Instead it seems to me that 
the most part of distributions prefers /boot/efi. I am wrong ?

I am working on a patch that automatically check if the /boot/efi/EFI exists, 
in such case bootctl would use /boot/efi as root of the ESP partition. This is 
quite easy; more difficult is to to update systemd-gpt-auto-generator too: it 
would make the same check, however /boot could be another mount: ie

- / -> btrfs subvolume
- /boot -> another btrfs subvolume
- /boot/efi -> ESP partition.

So the check should be made after mounting the /boot subvolume.

Any ideas ?

BR
G.Baroncelli


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] bootctl: default mount point for the ESP partition.

2015-09-01 Thread Goffredo Baroncelli
On 2015-09-01 18:47, Simon McVittie wrote:
> On 01/09/15 17:21, Goffredo Baroncelli wrote:
>> I discovered that bootctl assume as default mount point for the ESP
>> partition the /boot directory. Instead it seems to me that the most part
>> of distributions prefers /boot/efi.
> 
> For some context, the reasoning for /boot/efi is:
> 
> In some distributions (presumably including the (Fedora-based?) ones
> where this feature was developed), /boot is traditionally treated as
> mutable and unpackaged, like /var; so the packages include the kernel in
> /usr or /lib or something, and copy it into /boot. The cost of this is
> one extra copy of the kernel on-disk, which used to be a significant
> amount of space, although on modern disks it doesn't really matter.

On the basis of this "Fedora 22 Multiboot Guide"

https://docs.fedoraproject.org/en-US/Fedora/22/pdf/Multiboot_Guide/Fedora-22-Multiboot_Guide-en-US.pdf

(see page 18 for example) it seems that the ESP is mounted on /boot/efi

I found that only "arch" mounts ESP in /boot; the others (debian, ubuntu, suse) 
use /boot/efi

Here http://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/ 
reports that "...The ESP used for the current boot is automatically mounted to 
/boot,..."; this page was written by Lennart.

However most part of the distributions mount the ESP on /boot/efi...

[...]

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [ANNOUNCE] systemd 219

2015-02-17 Thread Goffredo Baroncelli
Hi Lennart,

On 2015-02-16 23:59, Lennart Poettering wrote:
> * journald now sets the special FS_NOCOW file flag for its
>   journal files. This should improve performance on btrfs, by
>   avoiding heavy fragmentation when journald's write-pattern
>   is used on COW file systems. It degrades btrfs' data
>   integrity guarantees for the files to the same levels as for
>   ext3/ext4 however. This should be OK though as journald does
>   its own data integrity checks and all its objects are
>   checksummed on disk. Also, journald should handle btrfs disk
>   full events a lot more gracefully now, by processing SIGBUS
>   errors, and not relying on fallocate() anymore.

If I read correctly the code, the FS_NOCOW is a temporary workaround, i.e.
when the file is closed (or rotated ?) the FS_NOCOW flags is unset again. 
It is true ?

If so, the time window where a file is un-protect by the checksum is
quite small. I was worried not about the corruption detection but about loosing 
the ability to recover the file from a good copy (if available) in case of 
corruption. 
But this seems limited only when the file is in use (before the next rotation).

BR
G.Baroncelli
-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [ANNOUNCE] systemd 219

2015-02-18 Thread Goffredo Baroncelli
Hi Lennart
On 2015-02-18 11:19, Lennart Poettering wrote:
> On Wed, 18.02.15 12:13, Joonas Sarajärvi (m...@iki.fi) wrote:
> 

> 
> FS_NOCOW does no effect btrfs raid settings. If you want this kind of
> data redundancy then it will continue to be available even though we
> set FS_NOCOW now.

Whitout checksum, BTRFS was unable to restore a good copy: in case of 
RAID1 a flip of a bit  makes the two copies different. Only the checksum
allows to detected which is the good copy.

This was already discussed in the thread (see the answers of Zygo and 
Chris Murhpy other than the my one)

http://www.spinics.net/lists/linux-btrfs/msg41024.html


> 
> Lennart
> 
Goffredo

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 4/4] Document the new parameter CowJournal

2015-03-05 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Document:
- the new parameter CowJournal in journald.conf
- the new options --cow-journal and --no-cow-journal for
  systemd-journal-remote.

Signed-off-by: Goffredo Baroncelli 
---
 man/journald.conf.xml  | 56 ++
 man/systemd-journal-remote.xml | 34 +
 2 files changed, 90 insertions(+)

diff --git a/man/journald.conf.xml b/man/journald.conf.xml
index 364b58f..9be19b8 100644
--- a/man/journald.conf.xml
+++ b/man/journald.conf.xml
@@ -362,6 +362,62 @@
 /dev/console.
   
 
+  
+CowJournal=
+
+   Controls the "no copy on write" attribute for the
+journal file. 
+   If CowJournal=no this attribute is set.
+   If CowJournal=yes this attribute is reset.
+   
+
+   For the BTRFS filesystem the "no copy on write" attribute disables
+   the COW behavior and the checksum feature ; this leads to a faster
+   journal operations. Instead if this attribute is not set, the COW
+   behavior and the checksum feature are enabled but the performances are
+   slower.
+
+   BTRFS uses the checksums to detect a corrupted file; the checksums
+   are also used to automatically correct the file if a good copy is
+   available (as in a RAID1 BTRFS filesystems). Without  checksums, BTRFS
+   is not able to detect and to repair a corrupted file.
+
+   In case of a BTRFS filesystem without redundancy (i.e. single
+   disk), it is safe tCowJournal=no because in any case
+   there is no a good copy available to correct the file.
+
+   Refer to man pages such as 
+   btrfs
+   5 and 
+   chattr
+   1  
+   for more details.
+   
+  
+
+
+
+  
+CowJournal=
+
+Controls the COW/NOCOW attribute for the journal file.
+If CowJournal=no the journal file has the NOCOW
+flags set.
+For the BTRFS filesystem this means that the COW behaviour (Copy
+On Write) is disable but also the checksum protection is also disabled.
+In this case journal operations are faster, but the journal file is not
+protected by the filesystem checksums.
+If CowJournal=yes the journal file has the NOCOW
+flag unset, and it is protected by the BTRFS checksum but the
+perfomance are slower.
+If the the journal file is corrupted on the disk and a good copy
+(raid mirror) is available, the checksum are used by BTRFS to rebuild
+the file content during a SCRUB process. If the checksum is not
+available, the SCRUB process is unable the repair of the journal file.
+In case of a single volume filesystem (no RAID), it is safe to have
+NoCowJournal=yes.
+  
+
 
 
   
diff --git a/man/systemd-journal-remote.xml b/man/systemd-journal-remote.xml
index 2687662..b14f4e3 100644
--- a/man/systemd-journal-remote.xml
+++ b/man/systemd-journal-remote.xml
@@ -258,6 +258,40 @@
   
 
   
+--cow-journal
+--no-cow-journal
+
+   Controls the "no copy on write" attribute for the
+journal file. 
+   If --no-cow-journal is passed, this attribute is set.
+   If --cow-journal is passed, this attribute is reset.
+   
+
+   For the BTRFS filesystem the "no copy on write" attribute disables
+   the COW behavior and the checksum feature ; this leads to a faster
+   journal operations. Instead if this attribute is not set, the COW
+   behavior and the checksum feature are enabled but the performances are
+   slower.
+
+   BTRFS uses the checksums to detect a corrupted file; the checksums
+   are also used to automatically correct the file if a good copy is
+   available (as in a RAID1 BTRFS filesystems). Without  checksums, BTRFS
+   is not able to detect and to repair a corrupted file.
+
+   In case of a BTRFS filesystem without redundancy (i.e. single
+   disk), it is safe to pass --no-cow-journal because in any case
+   there is no a good copy available to correct the file.
+
+   Refer to man pages such as 
+   btrfs
+   5 and 
+   chattr
+   1  
+   for more details.
+   
+  
+
+  
 --seal
 --no-seal
 
-- 
2.1.0

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 1/4] Add BTRFS COW journal option

2015-03-05 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

The commit 11689d2a force the NOCOW flag of the journal files. This was needed
because systemd-journald has very poor perfomance when the filesytem is
BTRFS due to its the COW behavior.

However removing the COW behavior, the journal file also lost  the
btrfs checksum protection, and this disables the BTRFS capability to rebuild a
corrupted file [1] in a RAID filesystem.

This patch adds the option "CowJournal" to the journald.conf to disable/enable
the NOCOW flag for the journal file.

[1] http://en.wikipedia.org/wiki/Data_scrubbing#Btrfs

Signed-off-by: Goffredo Baroncelli 
---
 src/journal/journal-file.c   | 34 +-
 src/journal/journal-file.h   |  3 +++
 src/journal/journald-gperf.gperf |  1 +
 src/journal/journald-server.c|  8 
 src/journal/journald-server.h|  2 ++
 src/journal/journald.conf|  1 +
 src/journal/sd-journal.c |  2 +-
 7 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/src/journal/journal-file.c b/src/journal/journal-file.c
index 2845e05..db0fa35 100644
--- a/src/journal/journal-file.c
+++ b/src/journal/journal-file.c
@@ -2529,6 +2529,7 @@ int journal_file_open(
 mode_t mode,
 bool compress,
 bool seal,
+bool allow_cow_journal,
 JournalMetrics *metrics,
 MMapCache *mmap_cache,
 JournalFile *template,
@@ -2558,6 +2559,7 @@ int journal_file_open(
 f->mode = mode;
 
 f->flags = flags;
+f->allow_cow_journal = allow_cow_journal;
 f->prot = prot_from_flags(flags);
 f->writable = (flags & O_ACCMODE) != O_RDONLY;
 #if defined(HAVE_LZ4)
@@ -2603,16 +2605,19 @@ int journal_file_open(
 
 if (f->last_stat.st_size == 0 && f->writable) {
 
-/* Before we write anything, turn off COW logic. Given
- * our write pattern that is quite unfriendly to COW
- * file systems this should greatly improve
- * performance on COW file systems, such as btrfs, at
- * the expense of data integrity features (which
- * shouldn't be too bad, given that we do our own
- * checksumming). */
-r = chattr_fd(f->fd, true, FS_NOCOW_FL);
-if (r < 0)
-log_warning_errno(errno, "Failed to set file 
attributes: %m");
+if (!f->allow_cow_journal) {
+/* Before we write anything, turn off COW logic. Given
+ * our write pattern that is quite unfriendly to COW
+ * file systems this should greatly improve
+ * performance on COW file systems, such as btrfs, at
+ * the expense of data integrity features (which
+ * shouldn't be too bad, given that we do our own
+ * checksumming). */
+r = chattr_fd(f->fd, true, FS_NOCOW_FL);
+if (r < 0)
+log_warning_errno(errno,
+"Failed to set file attributes: %m");
+}
 
 /* Let's attach the creation time to the journal file,
  * so that the vacuuming code knows the age of this
@@ -2773,7 +2778,7 @@ int journal_file_rotate(JournalFile **f, bool compress, 
bool seal) {
  * we archive them */
 old_file->defrag_on_close = true;
 
-r = journal_file_open(old_file->path, old_file->flags, old_file->mode, 
compress, seal, NULL, old_file->mmap, old_file, &new_file);
+r = journal_file_open(old_file->path, old_file->flags, old_file->mode, 
compress, seal, old_file->allow_cow_journal, NULL, old_file->mmap, old_file, 
&new_file);
 journal_file_close(old_file);
 
 *f = new_file;
@@ -2786,6 +2791,7 @@ int journal_file_open_reliably(
 mode_t mode,
 bool compress,
 bool seal,
+bool allow_cow_journal,
 JournalMetrics *metrics,
 MMapCache *mmap_cache,
 JournalFile *template,
@@ -2796,7 +2802,8 @@ int journal_file_open_reliably(
 _cleanup_free_ char *p = NULL;
 
 r = journal_file_open(fname, flags, mode, compress, seal,
-  metrics, mmap_cache, template, ret);
+  allow_cow_journal, metrics, mmap_cache,
+  template, ret);
 if (r != -EBADMSG && /* corrupted */
 r != -ENODATA && /* truncated */
 r != -EHOSTDOWN && /* other machine */
@@ -2838,7 +2845,8 @@ int journal_file_open_reliably(
 log_w

[systemd-devel] [PATCH 3/4] Update the tests

2015-03-05 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Update the tests due to the new interface journal_file_open() and
journal_file_open_reliably().

Signed-off-by: Goffredo Baroncelli 
---
 src/journal/test-journal-flush.c|  2 +-
 src/journal/test-journal-interleaving.c | 11 +++
 src/journal/test-journal-stream.c   |  6 +++---
 src/journal/test-journal-verify.c   |  6 +++---
 src/journal/test-journal.c  | 17 +++--
 5 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/src/journal/test-journal-flush.c b/src/journal/test-journal-flush.c
index 914ca0b..6d5d4d1 100644
--- a/src/journal/test-journal-flush.c
+++ b/src/journal/test-journal-flush.c
@@ -37,7 +37,7 @@ int main(int argc, char *argv[]) {
 assert_se(mkdtemp(dn));
 fn = strappend(dn, "/test.journal");
 
-r = journal_file_open(fn, O_CREAT|O_RDWR, 0644, false, false, NULL, 
NULL, NULL, &new_journal);
+r = journal_file_open(fn, O_CREAT|O_RDWR, 0644, false, false, false, 
NULL, NULL, NULL, &new_journal);
 assert_se(r >= 0);
 
 r = sd_journal_open(&j, 0);
diff --git a/src/journal/test-journal-interleaving.c 
b/src/journal/test-journal-interleaving.c
index 3c70601..e58d2b7 100644
--- a/src/journal/test-journal-interleaving.c
+++ b/src/journal/test-journal-interleaving.c
@@ -52,7 +52,7 @@ noreturn static void log_assert_errno(const char *text, int 
eno, const char *fil
 
 static JournalFile *test_open(const char *name) {
 JournalFile *f;
-assert_ret(journal_file_open(name, O_RDWR|O_CREAT, 0644, true, false, 
NULL, NULL, NULL, &f));
+assert_ret(journal_file_open(name, O_RDWR|O_CREAT, 0644, true, false, 
false, NULL, NULL, NULL, &f));
 return f;
 }
 
@@ -208,7 +208,8 @@ static void test_sequence_numbers(void) {
 assert_se(chdir(t) >= 0);
 
 assert_se(journal_file_open("one.journal", O_RDWR|O_CREAT, 0644,
-true, false, NULL, NULL, NULL, &one) == 0);
+true, false, false, NULL, NULL,
+NULL, &one) == 0);
 
 append_number(one, 1, &seqnum);
 printf("seqnum=%"PRIu64"\n", seqnum);
@@ -225,7 +226,8 @@ static void test_sequence_numbers(void) {
 memcpy(&seqnum_id, &one->header->seqnum_id, sizeof(sd_id128_t));
 
 assert_se(journal_file_open("two.journal", O_RDWR|O_CREAT, 0644,
-true, false, NULL, NULL, one, &two) == 0);
+true, false, false, NULL, NULL,
+one, &two) == 0);
 
 assert_se(two->header->state == STATE_ONLINE);
 assert_se(!sd_id128_equal(two->header->file_id, one->header->file_id));
@@ -256,7 +258,8 @@ static void test_sequence_numbers(void) {
 seqnum = 0;
 
 assert_se(journal_file_open("two.journal", O_RDWR, 0,
-true, false, NULL, NULL, NULL, &two) == 0);
+true, false, false, NULL, NULL,
+NULL, &two) == 0);
 
 assert_se(sd_id128_equal(two->header->seqnum_id, seqnum_id));
 
diff --git a/src/journal/test-journal-stream.c 
b/src/journal/test-journal-stream.c
index 3996e77..3377480 100644
--- a/src/journal/test-journal-stream.c
+++ b/src/journal/test-journal-stream.c
@@ -90,9 +90,9 @@ int main(int argc, char *argv[]) {
 assert_se(mkdtemp(t));
 assert_se(chdir(t) >= 0);
 
-assert_se(journal_file_open("one.journal", O_RDWR|O_CREAT, 0666, true, 
false, NULL, NULL, NULL, &one) == 0);
-assert_se(journal_file_open("two.journal", O_RDWR|O_CREAT, 0666, true, 
false, NULL, NULL, NULL, &two) == 0);
-assert_se(journal_file_open("three.journal", O_RDWR|O_CREAT, 0666, 
true, false, NULL, NULL, NULL, &three) == 0);
+assert_se(journal_file_open("one.journal", O_RDWR|O_CREAT, 0666, true, 
false, false, NULL, NULL, NULL, &one) == 0);
+assert_se(journal_file_open("two.journal", O_RDWR|O_CREAT, 0666, true, 
false, false, NULL, NULL, NULL, &two) == 0);
+assert_se(journal_file_open("three.journal", O_RDWR|O_CREAT, 0666, 
true, false, false, NULL, NULL, NULL, &three) == 0);
 
 for (i = 0; i < N_ENTRIES; i++) {
 char *p, *q;
diff --git a/src/journal/test-journal-verify.c 
b/src/journal/test-journal-verify.c
index 9da120c..71155e4 100644
--- a/src/journal/test-journal-verify.c
+++ b/src/journal/test-journal-verify.c
@@ -55,7 +55,7 @@ static int raw_verify(const char *fn, const char 
*verification_key) {
 JournalFile *f;
 int r;
 
-r = journal_file_open(fn, O_RDONLY, 0666, true, !!verification_key, 
NULL, NULL, NULL, &

[systemd-devel] [RFC][PATCH] Add option to enable COW for journal file

2015-03-05 Thread Goffredo Baroncelli

Hi All,
the enclosed patches add an option to the journald.conf file to allow
a COW behavior for the journal files.

The commit 11689d2a force the NOCOW flag of the journal files. This was 
needed because systemd-journald has very poor performance when the
filesytem is BTRFS due to its the COW behavior.

However removing the COW behavior, the journal file also lost  the
btrfs checksum protection, and this disables the BTRFS capability to rebuild a
corrupted file [1] in a RAID filesystem. So this limits one of the biggest
benefit of BTRFS.

This patch adds the option "CowJournal" to the journald.conf to disable/enable
the NOCOW flag for the journal file, allowing to revert to the old behavior.

These patches are tagged as RFC, because I am not sure about the
naming of the option (now "CowJournal"). I ask some feedback.

Comments are welcome.

[1] http://en.wikipedia.org/wiki/Data_scrubbing#Btrfs

BR
G.Baroncelli
-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 2/4] Update journal-remote

2015-03-05 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Update the journal-remote due to the update of the parameters of
the functions journal_file_open()/journal_file_open_reliably().


Moreover two options are added to systemd-journal-remote:
--cow-journal and --no-cow-journal to
unset/set the "no copy on write" attribute.

Signed-off-by: Goffredo Baroncelli 
---
 src/journal-remote/journal-remote.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/src/journal-remote/journal-remote.c 
b/src/journal-remote/journal-remote.c
index 8f32a9a..5ccd237 100644
--- a/src/journal-remote/journal-remote.c
+++ b/src/journal-remote/journal-remote.c
@@ -64,6 +64,7 @@ static char* arg_listen_https = NULL;
 static char** arg_files = NULL;
 static int arg_compress = true;
 static int arg_seal = false;
+static int arg_cow_journal = false;
 static int http_socket = -1, https_socket = -1;
 static char** arg_gnutls_log = NULL;
 
@@ -200,6 +201,7 @@ static int open_output(Writer *w, const char* host) {
 r = journal_file_open_reliably(output,
O_RDWR|O_CREAT, 0640,
arg_compress, arg_seal,
+   arg_cow_journal,
&w->metrics,
w->mmap,
NULL, &w->journal);
@@ -1149,6 +1151,8 @@ static void help(void) {
" --listen-https=ADDRListen for HTTPS connections at 
ADDR\n"
"  -o --output=FILE|DIR  Write output to FILE or 
DIR/external-*.journal\n"
" --compress[=BOOL]  XZ-compress the output journal 
(default: yes)\n"
+   " --cow-journal[=BOOL]   Enable/Disable the COW behaviour\n"
+   "for the journal file (default: 
no)\n"
" --seal[=BOOL]  Use event sealing (default: no)\n"
" --key=FILENAME SSL key in PEM format (default:\n"
"\"" PRIV_KEY_FILE "\")\n"
@@ -1179,6 +1183,7 @@ static int parse_argv(int argc, char *argv[]) {
 ARG_CERT,
 ARG_TRUST,
 ARG_GNUTLS_LOG,
+ARG_COW_JOURNAL,
 };
 
 static const struct option options[] = {
@@ -1192,6 +1197,7 @@ static int parse_argv(int argc, char *argv[]) {
 { "output",   required_argument, NULL, 'o'  },
 { "split-mode",   required_argument, NULL, ARG_SPLIT_MODE   },
 { "compress", optional_argument, NULL, ARG_COMPRESS },
+{ "cow-journal",  optional_argument, NULL, ARG_COW_JOURNAL  },
 { "seal", optional_argument, NULL, ARG_SEAL },
 { "key",  required_argument, NULL, ARG_KEY  },
 { "cert", required_argument, NULL, ARG_CERT },
@@ -1347,6 +1353,20 @@ static int parse_argv(int argc, char *argv[]) {
 
 break;
 
+case ARG_COW_JOURNAL:
+if (optarg) {
+r = parse_boolean(optarg);
+if (r < 0) {
+log_error("Failed to parse 
--allow-cow-journal= parameter.");
+return -EINVAL;
+}
+
+arg_cow_journal = !!r;
+} else
+arg_cow_journal = true;
+
+break;
+
 case ARG_SEAL:
 if (optarg) {
 r = parse_boolean(optarg);
-- 
2.1.0

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [RFC][PATCH] Add option to enable COW for journal file

2015-03-06 Thread Goffredo Baroncelli
On 2015-03-06 13:31, Lennart Poettering wrote:
> On Thu, 05.03.15 21:39, Goffredo Baroncelli (kreij...@libero.it) wrote:
> 
>>
>> Hi All,
>> the enclosed patches add an option to the journald.conf file to allow
>> a COW behavior for the journal files.
>>
[...]
> 
> I am pretty strongly against adding an explicit option for this. I
> consider this all a temporary stopgap, until btrfs's autodefrag makes
> the problem go away, and I am very conservative with adding new config
> options where it is already clear that they will eventually be
> useless.
> 
> What I'd be open to is:
> 
> a) beef up the tmpfiles logic to be able to do the equivalent of
>/usr/bin/chattr on files. (By adding a new line type, make "h" or
>so).
> 
> b) remove all code that sets NOCOW explicitly on journal files from
>journald.
> 
> c) instead ship a tmpfiles snippet making use of a) that sets the
>NOCOW flag for /var/lib/journal, which results in the flag being
>inherited by journal files that are created within it.
> 
> If people then want to opt-out of NOCOW, they can simply override the
> tmpfiles snippet and all is good. That way we have a way to configure
> the bit, without actually introducing a high-level config option for
> this.

This seems reasonable; I will work on a proposal like this one.

> 
> Lennart
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 1/3] Allow systemd-tmpfiles to set the file/directory attributes

2015-03-08 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Allow systemd-tmpfiles to set the file/directory attributes, like chattr(1)
does. Two more commands are added: 'H' and 'h' to set the attributes,
recursively and not.

Signed-off-by: Goffredo Baroncelli 
---
 src/tmpfiles/tmpfiles.c | 155 
 1 file changed, 155 insertions(+)

diff --git a/src/tmpfiles/tmpfiles.c b/src/tmpfiles/tmpfiles.c
index c948d4d..cb77047 100644
--- a/src/tmpfiles/tmpfiles.c
+++ b/src/tmpfiles/tmpfiles.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "log.h"
 #include "util.h"
@@ -90,6 +91,8 @@ typedef enum ItemType {
 ADJUST_MODE = 'm', /* legacy, 'z' is identical to this */
 RELABEL_PATH = 'z',
 RECURSIVE_RELABEL_PATH = 'Z',
+SET_ATTRIB = 'h',
+RECURSIVE_SET_ATTRIB = 'H',
 } ItemType;
 
 typedef struct Item {
@@ -108,12 +111,15 @@ typedef struct Item {
 usec_t age;
 
 dev_t major_minor;
+int attrib_value;
+int attrib_mask;
 
 bool uid_set:1;
 bool gid_set:1;
 bool mode_set:1;
 bool age_set:1;
 bool mask_perms:1;
+bool attrib_set:1;
 
 bool keep_first_level:1;
 
@@ -762,6 +768,130 @@ static int path_set_acls(Item *item, const char *path) {
 return 0;
 }
 
+static int get_attrib_from_arg(Item *item) {
+struct attributes_list_t { int value; char ch; } ;
+static struct attributes_list_t attributes[] = {
+{ FS_NOATIME_FL, 'A' },   /* do not update atime */
+{ FS_SYNC_FL, 'S' },  /* Synchronous updates */
+{ FS_DIRSYNC_FL, 'D' },   /* dirsync behaviour 
(directories only) */
+{ FS_APPEND_FL, 'a' },/* writes to file may only 
append */
+{ FS_COMPR_FL, 'c' }, /* Compress file */
+{ FS_NODUMP_FL, 'd' },/* do not dump file */
+{ FS_EXTENT_FL, 'e'}, /* Top of directory 
hierarchies*/
+{ FS_IMMUTABLE_FL, 'i' }, /* Immutable file */
+{ FS_JOURNAL_DATA_FL, 'j' }, /* Reserved for ext3 */
+{ FS_SECRM_FL, 's' }, /* Secure deletion */
+{ FS_UNRM_FL, 'u' },  /* Undelete */
+{ FS_NOTAIL_FL, 't' },/* file tail should not be 
merged */
+{ FS_TOPDIR_FL, 'T' },/* Top of directory 
hierarchies*/
+{ FS_NOCOW_FL, 'C' }, /* Do not cow file */
+{ 0, 0 }
+};
+char *p = item->argument;
+enum { MODE_ADD, MODE_DEL, MODE_SET } mode = MODE_ADD;
+int value=0, mask=0;
+
+if (!p) {
+log_error("\"%s\": setting ATTR need an argument", item->path);
+return -1;
+}
+
+if (*p == '+') {
+mode = MODE_ADD;
+p++;
+} else if (*p == '-') {
+mode = MODE_DEL;
+p++;
+} else  if (*p == '=') {
+mode = MODE_SET;
+p++;
+}
+
+if (!*p && mode != MODE_SET) {
+log_error("\"%s\": setting ATTR: argument is empty", 
item->path);
+return -4;
+}
+for ( ; *p ; p++ ) {
+int i;
+for ( i = 0; attributes[i].ch && attributes[i].ch != *p ;i++);
+if (!attributes[i].ch) {
+log_error("\"%s\": setting ATTR: unknown attr '%c'",
+item->path, *p);
+return -2;
+}
+if (mode == MODE_ADD || mode == MODE_SET)
+value |= attributes[i].value;
+else
+value &= ~attributes[i].value;
+mask |= attributes[i].value;
+}
+
+if (mode == MODE_SET) {
+int i;
+for ( i = 0; attributes[i].ch;i++)
+mask |= attributes[i].value;
+}
+
+assert(mask);
+
+item->attrib_mask |= mask;
+item->attrib_value &= ~mask;
+item->attrib_value |= value;
+item->attrib_set = true;
+
+
+return 0;
+
+}
+
+static int path_set_attrib(Item *item, const char *path) {
+int fd, r, f;
+struct stat st;
+
+/* do nothing */
+if (item->attrib_mask == 0 || !item->attrib_set)
+return 0;
+
+if (!lstat(path, &am

[systemd-devel] [PATCH] Allow systemd-tmpfiles to set file/directory attributes

2015-03-08 Thread Goffredo Baroncelli

Hi all,
This set of patches add two new line types to the tmpfiles files format.
These new types of line are 'H' and 'h', and allow to change the file/
directory attributes, like chattr(1) does.

One of the motivation of these patches is to get rid of the commit
11689d2a which force the NOCOW flag for the journal files. This was 
needed because systemd-journald has very poor performance when the
filesytem is BTRFS due to its the COW behavior. My concern is that 
the NOCOW flag also prevent BTRFS to rebuild a corrupted file from a 
good copy if it is available. 

With this patch, now the NOCOW flag can be set by systemd-tmpfiles.
See [1] for further information.

BR
G.Baroncelli

[1] Re: [systemd-devel] [RFC][PATCH] Add option to enable COW for journal file
https://www.mail-archive.com/systemd-devel@lists.freedesktop.org/msg28724.html
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 3/3] Add a new tmpfiles.d snippets to set the NOCOW the journal.

2015-03-08 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Add a new tmpfiles.d snippets to set the NOCOW attributes for the
journal files. This allow better perfomance when the root file system
is BTRFS. Pay attention that the NOCOW flags disables the checksum and
prevent scrub to rebuild a corruputed journal.

Signed-off-by: Goffredo Baroncelli 
---
 tmpfiles.d/journal-nocow.conf | 12 
 1 file changed, 12 insertions(+)
 create mode 100644 tmpfiles.d/journal-nocow.conf

diff --git a/tmpfiles.d/journal-nocow.conf b/tmpfiles.d/journal-nocow.conf
new file mode 100644
index 000..43a4f2b
--- /dev/null
+++ b/tmpfiles.d/journal-nocow.conf
@@ -0,0 +1,12 @@
+#  This file is part of systemd.
+#
+#  systemd is free software; you can redistribute it and/or modify it
+#  under the terms of the GNU Lesser General Public License as published by
+#  the Free Software Foundation; either version 2.1 of the License, or
+#  (at your option) any later version.
+
+# See tmpfiles.d(5) for details
+
+
+# set the journal file as NOCOW; only valid for BTRFS filesystem
+h /var/log/journal/%m - - - - +C
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 2/3] Update the man page of tmpfiles.d(5), to document the new h/H command.

2015-03-08 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Update the man page of tmpfiles.d(5), to document the new h/H command.

Signed-off-by: Goffredo Baroncelli 
---
 man/tmpfiles.d.xml | 32 
 1 file changed, 32 insertions(+)

diff --git a/man/tmpfiles.d.xml b/man/tmpfiles.d.xml
index 8815bf9..f9074dd 100644
--- a/man/tmpfiles.d.xml
+++ b/man/tmpfiles.d.xml
@@ -303,6 +303,37 @@
 
 
 
+  h
+  Set file/directory attributes. Lines of this type
+  accept shell-style globs in place of normal path names.
+
+  The format of agrument field is [+-=][aAcCdDeijsStTu]
+  
+
+  The prefix + causes the
+  attribute(s) to be added; - causes the
+  attribute(s) to be removed; =
+  causes the attributes to set exactly as the following letters.
+  The letters 'aAcCdDeijsStTu' select the new
+  attributes for the files, see
+  chattr
+  1 for further information.
+  
+  Passing only = as argument,
+  reset all the file attributes.
+
+  
+
+
+
+  H
+  Recursively set file/directory attributes Lines
+  of this type accept shell-style globs in place of normal
+  path names.
+  
+
+
+
   a
   a+
   Set POSIX ACLs (access control lists). If
@@ -529,6 +560,7 @@
   setfattr1,
   setfacl1,
   getfacl1
+  chattr1
 
   
 
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 3/3] Add a new tmpfiles.d snippets to set the NOCOW the journal.

2015-03-09 Thread Goffredo Baroncelli
On 2015-03-08 15:34, Zbigniew Jędrzejewski-Szmek wrote:
> On Sun, Mar 08, 2015 at 12:48:27PM +0100, Goffredo Baroncelli wrote:
>> From: Goffredo Baroncelli 
>>
>> Add a new tmpfiles.d snippets to set the NOCOW attributes for the
>> journal files. This allow better perfomance when the root file system
>> is BTRFS. Pay attention that the NOCOW flags disables the checksum and
>> prevent scrub to rebuild a corruputed journal.
>>
>> Signed-off-by: Goffredo Baroncelli 
> Please drop the SOB.
> 
>> +# set the journal file as NOCOW; only valid for BTRFS filesystem
>> +h /var/log/journal/%m - - - - +C
> Shouldn't this be 'H'?

Hi,
if a directory is marked as +C, each *new* files inherited the NOCOW flag:

$ mkdir test
$ lsattr -d test/
 test/
$ dd if=/dev/zero of=test/one bs=1M count=1
$ lsattr -a test
 test/.
 test/..
 test/one
$ chattr +C test
$ dd if=/dev/zero of=test/two bs=1M count=1
$ lsattr -a test
---C test/.
 test/..
 test/one
-------C test/two

> Zbyszek
> 
Goffredo


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 2/3] Update the man page of tmpfiles.d(5), to document the new h/H command.

2015-03-09 Thread Goffredo Baroncelli
On 2015-03-08 15:50, Ronny Chevalier wrote:
> 2015-03-08 12:48 GMT+01:00 Goffredo Baroncelli :
>> From: Goffredo Baroncelli 
>>
>> Update the man page of tmpfiles.d(5), to document the new h/H command.
>>
>> Signed-off-by: Goffredo Baroncelli 
> 
> No Signed-off-by.
> 
> Also, why not merge the 3 commits in one ? I don't see why separating
> the man page update of the new feature in another commit is useful ?
> 
>> ---
>>  man/tmpfiles.d.xml | 32 
>>  1 file changed, 32 insertions(+)
>>
>> diff --git a/man/tmpfiles.d.xml b/man/tmpfiles.d.xml
>> index 8815bf9..f9074dd 100644
>> --- a/man/tmpfiles.d.xml
>> +++ b/man/tmpfiles.d.xml
>> @@ -303,6 +303,37 @@
>>  
>>
>>  
>> +  h
>> +  Set file/directory attributes. Lines of this type
>> +  accept shell-style globs in place of normal path names.
>> +
>> +  The format of agrument field is 
>> [+-=][aAcCdDeijsStTu]
> 
> the argument*
ok
> 
>> +  
>> +
>> +  The prefix + causes the
>> +  attribute(s) to be added; - causes the
>> +  attribute(s) to be removed; =
>> +  causes the attributes to set exactly as the following 
>> letters.
>> +  The letters 'aAcCdDeijsStTu' select the new
>> +  attributes for the files, see
>> +  chattr
>> +  1 for further information.
>> +  
>> +  Passing only = as argument,
>> +  reset all the file attributes.
>> +
>> +  
>> +
>> +
>> +
>> +  H
>> +  Recursively set file/directory attributes Lines
> 
> A . is missing before Lines.
thank,  correct
> 
>> +  of this type accept shell-style globs in place of normal
>> +  path names.
>> +  
>> +
>> +
>> +
>>a
>>a+
>>Set POSIX ACLs (access control lists). If
>> @@ -529,6 +560,7 @@
>>> project='man-pages'>setfattr1,
>>> project='man-pages'>setfacl1,
>>> project='man-pages'>getfacl1
>> +  > project='man-pages'>chattr1
>>  
>>
>>
>> --
>> 2.1.4
>>
>> ___
>> systemd-devel mailing list
>> systemd-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/systemd-devel
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 1/3] Allow systemd-tmpfiles to set the file/directory attributes

2015-03-09 Thread Goffredo Baroncelli
On 2015-03-08 15:00, Ronny Chevalier wrote:
> 2015-03-08 12:48 GMT+01:00 Goffredo Baroncelli :
>> From: Goffredo Baroncelli 
>>
> 
> Hi,
> 
>> Allow systemd-tmpfiles to set the file/directory attributes, like chattr(1)
>> does. Two more commands are added: 'H' and 'h' to set the attributes,
>> recursively and not.
>>
>> Signed-off-by: Goffredo Baroncelli 
> 
> No Signed-off-by in systemd.
> 
>> ---
>>  src/tmpfiles/tmpfiles.c | 155 
>> 
>>  1 file changed, 155 insertions(+)
>>
>> diff --git a/src/tmpfiles/tmpfiles.c b/src/tmpfiles/tmpfiles.c
>> index c948d4d..cb77047 100644
>> --- a/src/tmpfiles/tmpfiles.c
>> +++ b/src/tmpfiles/tmpfiles.c
>> @@ -40,6 +40,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  #include "log.h"
>>  #include "util.h"
>> @@ -90,6 +91,8 @@ typedef enum ItemType {
>>  ADJUST_MODE = 'm', /* legacy, 'z' is identical to this */
>>  RELABEL_PATH = 'z',
>>  RECURSIVE_RELABEL_PATH = 'Z',
>> +SET_ATTRIB = 'h',
>> +RECURSIVE_SET_ATTRIB = 'H',
>>  } ItemType;
>>
>>  typedef struct Item {
>> @@ -108,12 +111,15 @@ typedef struct Item {
>>  usec_t age;
>>
>>  dev_t major_minor;
>> +int attrib_value;
>> +int attrib_mask;
>>
>>  bool uid_set:1;
>>  bool gid_set:1;
>>  bool mode_set:1;
>>  bool age_set:1;
>>  bool mask_perms:1;
>> +bool attrib_set:1;
>>
>>  bool keep_first_level:1;
>>
>> @@ -762,6 +768,130 @@ static int path_set_acls(Item *item, const char *path) 
>> {
>>  return 0;
>>  }
>>
>> +static int get_attrib_from_arg(Item *item) {
>> +struct attributes_list_t { int value; char ch; } ;
>> +static struct attributes_list_t attributes[] = {
>> +{ FS_NOATIME_FL, 'A' },   /* do not update atime */
>> +{ FS_SYNC_FL, 'S' },  /* Synchronous updates */
>> +{ FS_DIRSYNC_FL, 'D' },   /* dirsync behaviour 
>> (directories only) */
>> +{ FS_APPEND_FL, 'a' },/* writes to file may 
>> only append */
>> +{ FS_COMPR_FL, 'c' }, /* Compress file */
>> +{ FS_NODUMP_FL, 'd' },/* do not dump file */
>> +{ FS_EXTENT_FL, 'e'}, /* Top of directory 
>> hierarchies*/
>> +{ FS_IMMUTABLE_FL, 'i' }, /* Immutable file */
>> +{ FS_JOURNAL_DATA_FL, 'j' }, /* Reserved for ext3 */
>> +{ FS_SECRM_FL, 's' }, /* Secure deletion */
>> +{ FS_UNRM_FL, 'u' },  /* Undelete */
>> +{ FS_NOTAIL_FL, 't' },/* file tail should not 
>> be merged */
>> +{ FS_TOPDIR_FL, 'T' },/* Top of directory 
>> hierarchies*/
>> +{ FS_NOCOW_FL, 'C' }, /* Do not cow file */
>> +{ 0, 0 }
>> +};
>> +char *p = item->argument;
>> +enum { MODE_ADD, MODE_DEL, MODE_SET } mode = MODE_ADD;
>> +int value=0, mask=0;
>> +
>> +if (!p) {
>> +log_error("\"%s\": setting ATTR need an argument", 
>> item->path);
>> +return -1;
> 
> Please use errno style error code. In this case -EINVAL is appropriate.
ok
> 
>> +}
>> +
>> +if (*p == '+') {
>> +mode = MODE_ADD;
>> +p++;
>> +} else if (*p == '-') {
>> +mode = MODE_DEL;
>> +p++;
>> +} else  if (*p == '=') {
>> +mode = MODE_SET;
>> +p++;
>> +}
>> +
>> +if (!*p && mode != MODE_SET) {
>> +log_error("\"%s\": setting ATTR: argument is empty", 
>> item->path);
>> +return -4;
> 
> Same here.
ok
> 
>> +}
>> +for ( ; *p ; p++ ) {
>> +int i;
>> +  

Re: [systemd-devel] [PATCH 1/3] Allow systemd-tmpfiles to set the file/directory attributes

2015-03-10 Thread Goffredo Baroncelli
On 2015-03-08 23:06, Lennart Poettering wrote:
> On Sun, 08.03.15 12:48, Goffredo Baroncelli (kreij...@libero.it) wrote:
> 
>>  dev_t major_minor;
>> +int attrib_value;
>> +int attrib_mask;
> 
> "int" appears to be a strange choice for a bitmask. The existing
> chattr_fd() and chattr_path() calls use "unsigned" for this, so this
> should too...

I read the code of chattr, and in this code these values are int.
Checking the definition of the ioctls I found:

#define FS_IOC_SETFLAGS _IOW('f', 2, long)
#define FS_IOC_GETFLAGS _IOR('f', 1, long)

so the ioctl argument seems to be a *signed long value*

Anyway I changed they to unsigned long.



> 
>>  
>>  bool uid_set:1;
>>  bool gid_set:1;
>>  bool mode_set:1;
>>  bool age_set:1;
>>  bool mask_perms:1;
>> +bool attrib_set:1;
>>  
>>  bool keep_first_level:1;
>>  
>> @@ -762,6 +768,130 @@ static int path_set_acls(Item *item, const char *path) 
>> {
>>  return 0;
>>  }
>>  
>> +static int get_attrib_from_arg(Item *item) {
>> +struct attributes_list_t { int value; char ch; } ;
> 
> The _t suffix is usually reserved for typedefs... 
> 
> Also, it appears unnecessary to define a struct here at all, it can
> just be anonymously defined when delcaring the array.
ok
> 
>> +static struct attributes_list_t attributes[] = {
>> +{ FS_NOATIME_FL, 'A' },   /* do not update atime */
>> +{ FS_SYNC_FL, 'S' },  /* Synchronous updates */
>> +{ FS_DIRSYNC_FL, 'D' },   /* dirsync behaviour 
>> (directories only) */
>> +{ FS_APPEND_FL, 'a' },/* writes to file may 
>> only append */
>> +{ FS_COMPR_FL, 'c' }, /* Compress file */
>> +{ FS_NODUMP_FL, 'd' },/* do not dump file */
>> +{ FS_EXTENT_FL, 'e'}, /* Top of directory 
>> hierarchies*/
>> +{ FS_IMMUTABLE_FL, 'i' }, /* Immutable file */
>> +{ FS_JOURNAL_DATA_FL, 'j' }, /* Reserved for ext3 */
>> +{ FS_SECRM_FL, 's' }, /* Secure deletion */
>> +{ FS_UNRM_FL, 'u' },  /* Undelete */
>> +{ FS_NOTAIL_FL, 't' },/* file tail should not 
>> be merged */
>> +{ FS_TOPDIR_FL, 'T' },/* Top of directory 
>> hierarchies*/
>> +{ FS_NOCOW_FL, 'C' }, /* Do not cow file */
>> +{ 0, 0 }
>> +};
> 
> Indenting borked.
> 
> const missing.
ok
> 
>> +char *p = item->argument;
>> +enum { MODE_ADD, MODE_DEL, MODE_SET } mode = MODE_ADD;
> 
> So far we avoided defining enums in single lines, we line-broke them
> nicely, once for each enum value. This should be done here too.
ok
> 
>> +int value=0, mask=0;
> 
> Spaces after and before the "=" please.
> 
>> +
>> +if (!p) {
>> +log_error("\"%s\": setting ATTR need an argument", 
>> item->path);
>> +return -1;
> 
> Please use explicit error codes, like -EINVAL, don't make up numeric
> error codes like -1.
ok
> 
> Also see CODING_STYLE
> 
>> +}
>> +
>> +if (*p == '+') {
>> +mode = MODE_ADD;
>> +p++;
>> +} else if (*p == '-') {
>> +mode = MODE_DEL;
>> +p++;
>> +} else  if (*p == '=') {
>> +mode = MODE_SET;
>> +p++;
>> +}
>> +
>> +if (!*p && mode != MODE_SET) {
>> +log_error("\"%s\": setting ATTR: argument is empty", 
>> item->path);
>> +return -4;
> 
> Error code
ok
> 
>> +}
>> +for ( ; *p ; p++ ) {
> 
> Weird spaces...
ok
> 
>> +int i;
>> +for ( i = 0; attributes[i].ch && attributes[i].ch != *p 
>> ;i++);
> 
> Weird spaces...
> 
> Also, please add an explicit line break before the ";", so that it is
> clear that this is a for loop without a 

[systemd-devel] [PATCH 3/4] Update the man page of tmpfiles.d(5), to document the new h/H command.

2015-03-10 Thread Goffredo Baroncelli
Update the man page of tmpfiles.d(5), to document the new h/H command.
---
 man/tmpfiles.d.xml | 32 
 1 file changed, 32 insertions(+)

diff --git a/man/tmpfiles.d.xml b/man/tmpfiles.d.xml
index 8815bf9..469deeb 100644
--- a/man/tmpfiles.d.xml
+++ b/man/tmpfiles.d.xml
@@ -303,6 +303,37 @@
 
 
 
+  h
+  Set file/directory attributes. Lines of this type
+  accept shell-style globs in place of normal path names.
+
+  The format of the argument field is 
[+-=][aAcCdDeijsStTu]
+  
+
+  The prefix + causes the
+  attribute(s) to be added; - causes the
+  attribute(s) to be removed; =
+  causes the attributes to set exactly as the following letters.
+  The letters 'aAcCdDeijsStTu' select the new
+  attributes for the files, see
+  chattr
+  1 for further information.
+  
+  Passing only = as argument,
+  reset all the file attributes.
+
+  
+
+
+
+  H
+  Recursively set file/directory attributes. Lines
+  of this type accept shell-style globs in place of normal
+  path names.
+  
+
+
+
   a
   a+
   Set POSIX ACLs (access control lists). If
@@ -529,6 +560,7 @@
   setfattr1,
   setfacl1,
   getfacl1
+  chattr1
 
   
 
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 1/4] Add change_attr_fd()

2015-03-10 Thread Goffredo Baroncelli
Add change_attr_fd() function to modify the file/directory attribute.
---
 src/shared/util.c | 22 ++
 src/shared/util.h |  1 +
 2 files changed, 23 insertions(+)

diff --git a/src/shared/util.c b/src/shared/util.c
index ba035ca..56097ec 100644
--- a/src/shared/util.c
+++ b/src/shared/util.c
@@ -7832,6 +7832,28 @@ int chattr_path(const char *p, bool b, unsigned mask) {
 return chattr_fd(fd, b, mask);
 }
 
+int change_attr_fd(int fd, unsigned value, unsigned mask) {
+unsigned old_attr, new_attr;
+
+assert(fd >= 0);
+
+if (mask == 0)
+return 0;
+
+if (ioctl(fd, FS_IOC_GETFLAGS, &old_attr) < 0)
+return -errno;
+
+new_attr = (old_attr & ~mask) |(value & mask);
+
+if (new_attr == old_attr)
+return 0;
+
+if (ioctl(fd, FS_IOC_SETFLAGS, &new_attr) < 0)
+return -errno;
+
+return 0;
+}
+
 int read_attr_fd(int fd, unsigned *ret) {
 assert(fd >= 0);
 
diff --git a/src/shared/util.h b/src/shared/util.h
index a83b588..a0bc883 100644
--- a/src/shared/util.h
+++ b/src/shared/util.h
@@ -1052,6 +1052,7 @@ int same_fd(int a, int b);
 
 int chattr_fd(int fd, bool b, unsigned mask);
 int chattr_path(const char *p, bool b, unsigned mask);
+int change_attr_fd(int fd, unsigned value, unsigned mask);
 
 int read_attr_fd(int fd, unsigned *ret);
 int read_attr_path(const char *p, unsigned *ret);
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH V2] Allow systemd-tmpfiles to set file/directory attributes

2015-03-10 Thread Goffredo Baroncelli

Hi all,
This set of patches add two new line types to the tmpfiles files format.
These new types of line are 'h' and 'H' (the recursively version), and 
allow to change the file/directory attributes, like chattr(1) does.

One of the motivation of these patches is to get rid of the commit
11689d2a which force the NOCOW flag for the journal files. This was 
needed because systemd-journald has very poor performance when the
filesytem is BTRFS due to its the COW behavior. My concern is that 
the NOCOW flag also prevent BTRFS to rebuild a corrupted file from a 
good copy if it is available. 

With this patch, now the NOCOW flag can be set by systemd-tmpfiles.
See [1] for further information.

BR
G.Baroncelli

Changelog:
v1: first issue
v2: accepted several suggestion on the style; added function 
change_attr_fd(); used the _cleanup_close_; returned
negative errno;


[1] Re: [systemd-devel] [RFC][PATCH] Add option to enable COW for journal file
 https://www.mail-archive.com/systemd-devel@lists.freedesktop.org/msg28724.html

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH V2] Allow systemd-tmpfiles to set file/directory attributes

2015-03-10 Thread Goffredo Baroncelli
Please, forgot these patches: there is a bug inside.
Sorry for the noise.

BR
G.Baroncelli

On 2015-03-10 20:36, Goffredo Baroncelli wrote:
> 
> Hi all,
> This set of patches add two new line types to the tmpfiles files format.
> These new types of line are 'h' and 'H' (the recursively version), and 
> allow to change the file/directory attributes, like chattr(1) does.
> 
> One of the motivation of these patches is to get rid of the commit
> 11689d2a which force the NOCOW flag for the journal files. This was 
> needed because systemd-journald has very poor performance when the
> filesytem is BTRFS due to its the COW behavior. My concern is that 
> the NOCOW flag also prevent BTRFS to rebuild a corrupted file from a 
> good copy if it is available. 
> 
> With this patch, now the NOCOW flag can be set by systemd-tmpfiles.
> See [1] for further information.
> 
> BR
> G.Baroncelli
> 
> Changelog:
> v1: first issue
> v2: accepted several suggestion on the style; added function 
> change_attr_fd(); used the _cleanup_close_; returned
> negative errno;
> 
> 
> [1] Re: [systemd-devel] [RFC][PATCH] Add option to enable COW for journal file
>  
> https://www.mail-archive.com/systemd-devel@lists.freedesktop.org/msg28724.html
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 2/4] Allow systemd-tmpfiles to set the file/directory attributes

2015-03-10 Thread Goffredo Baroncelli
Allow systemd-tmpfiles to set the file/directory attributes, like chattr(1)
does. Two more commands are added: 'H' and 'h' to set the attributes,
recursively and not.
---
 src/tmpfiles/tmpfiles.c | 140 
 1 file changed, 140 insertions(+)

diff --git a/src/tmpfiles/tmpfiles.c b/src/tmpfiles/tmpfiles.c
index c948d4d..165605f 100644
--- a/src/tmpfiles/tmpfiles.c
+++ b/src/tmpfiles/tmpfiles.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "log.h"
 #include "util.h"
@@ -90,6 +91,8 @@ typedef enum ItemType {
 ADJUST_MODE = 'm', /* legacy, 'z' is identical to this */
 RELABEL_PATH = 'z',
 RECURSIVE_RELABEL_PATH = 'Z',
+SET_ATTRIB = 'h',
+RECURSIVE_SET_ATTRIB = 'H',
 } ItemType;
 
 typedef struct Item {
@@ -108,12 +111,15 @@ typedef struct Item {
 usec_t age;
 
 dev_t major_minor;
+int attrib_value;
+int attrib_mask;
 
 bool uid_set:1;
 bool gid_set:1;
 bool mode_set:1;
 bool age_set:1;
 bool mask_perms:1;
+bool attrib_set:1;
 
 bool keep_first_level:1;
 
@@ -762,6 +768,115 @@ static int path_set_acls(Item *item, const char *path) {
 return 0;
 }
 
+static int get_attrib_from_arg(Item *item) {
+static const struct { int value; char ch; } attributes[] = {
+{ FS_NOATIME_FL, 'A' },   /* do not update atime */
+{ FS_SYNC_FL, 'S' },  /* Synchronous updates */
+{ FS_DIRSYNC_FL, 'D' },   /* dirsync behaviour (directories 
only) */
+{ FS_APPEND_FL, 'a' },/* writes to file may only append */
+{ FS_COMPR_FL, 'c' }, /* Compress file */
+{ FS_NODUMP_FL, 'd' },/* do not dump file */
+{ FS_EXTENT_FL, 'e'}, /* Top of directory hierarchies*/
+{ FS_IMMUTABLE_FL, 'i' }, /* Immutable file */
+{ FS_JOURNAL_DATA_FL, 'j' }, /* Reserved for ext3 */
+{ FS_SECRM_FL, 's' }, /* Secure deletion */
+{ FS_UNRM_FL, 'u' },  /* Undelete */
+{ FS_NOTAIL_FL, 't' },/* file tail should not be merged */
+{ FS_TOPDIR_FL, 'T' },/* Top of directory hierarchies*/
+{ FS_NOCOW_FL, 'C' }, /* Do not cow file */
+{ 0, 0 }
+};
+char *p = item->argument;
+enum {
+MODE_ADD,
+MODE_DEL,
+MODE_SET
+} mode = MODE_ADD;
+unsigned long value = 0, mask = 0;
+
+if (!p) {
+log_error("\"%s\": setting ATTR need an argument", item->path);
+return -EINVAL;
+}
+
+if (*p == '+') {
+mode = MODE_ADD;
+p++;
+} else if (*p == '-') {
+mode = MODE_DEL;
+p++;
+} else  if (*p == '=') {
+mode = MODE_SET;
+p++;
+}
+
+if (!*p && mode != MODE_SET) {
+log_error("\"%s\": setting ATTR: argument is empty", 
item->path);
+return -EINVAL;
+}
+for (; *p ; p++) {
+int i;
+for (i = 0; attributes[i].ch && attributes[i].ch != *p; i++)
+;
+if (!attributes[i].ch) {
+log_error("\"%s\": setting ATTR: unknown attr '%c'", 
item->path, *p);
+return -EINVAL;
+}
+if (mode == MODE_ADD || mode == MODE_SET)
+value |= attributes[i].value;
+else
+value &= ~attributes[i].value;
+mask |= attributes[i].value;
+}
+
+if (mode == MODE_SET) {
+int i;
+for (i = 0; attributes[i].ch; i++)
+mask |= attributes[i].value;
+}
+
+assert(mask);
+
+item->attrib_mask = mask;
+item->attrib_value = value;
+item->attrib_set = true;
+
+return 0;
+
+}
+
+static int path_set_attrib(Item *item, const char *path) {
+_cleanup_close_ int fd = -1;
+int r;
+unsigned f;
+struct stat st;
+
+/* do nothing */
+if (item->attrib_mask == 0 || !item->attrib_set)
+return 0;
+
+if (lstat(path, &st) == 0 &&
+!S_ISREG(st.st_mode) && !S_ISDIR(st.st_mode)) {
+return 0;
+}
+
+fd = open(path, O_RDONLY|O_NONBLOCK|O_CLOEXEC);
+
+if (fd < 0)
+return log_error_errno(errno, "Cannot open \"%s\": %m", path);
+
+f = item->attrib_value & item->attrib_mask;
+if (!S_ISDIR(st.st_mode))
+f &= ~FS_DIRSYNC_FL;
+r = change_attr_fd(fd, f, item->attrib_mask);
+if (r < 0)
+return log_error_errno(errno,
+"Can

[systemd-devel] [PATCH 4/4] Add a new tmpfiles.d snippets to set the NOCOW the journal.

2015-03-10 Thread Goffredo Baroncelli
Add a new tmpfiles.d snippets to set the NOCOW attributes for the
journal files. This allow better perfomance when the root file system
is BTRFS. Pay attention that the NOCOW flags disables the checksum and
prevent scrub to rebuild a corrupted journal.
---
 tmpfiles.d/journal-nocow.conf | 12 
 1 file changed, 12 insertions(+)
 create mode 100644 tmpfiles.d/journal-nocow.conf

diff --git a/tmpfiles.d/journal-nocow.conf b/tmpfiles.d/journal-nocow.conf
new file mode 100644
index 000..43a4f2b
--- /dev/null
+++ b/tmpfiles.d/journal-nocow.conf
@@ -0,0 +1,12 @@
+#  This file is part of systemd.
+#
+#  systemd is free software; you can redistribute it and/or modify it
+#  under the terms of the GNU Lesser General Public License as published by
+#  the Free Software Foundation; either version 2.1 of the License, or
+#  (at your option) any later version.
+
+# See tmpfiles.d(5) for details
+
+
+# set the journal file as NOCOW; only valid for BTRFS filesystem
+h /var/log/journal/%m - - - - +C
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 1/4] Add change_attr_fd()

2015-03-10 Thread Goffredo Baroncelli
Add change_attr_fd() function to modify the file/directory attribute.
---
 src/shared/util.c | 22 ++
 src/shared/util.h |  1 +
 2 files changed, 23 insertions(+)

diff --git a/src/shared/util.c b/src/shared/util.c
index ba035ca..56097ec 100644
--- a/src/shared/util.c
+++ b/src/shared/util.c
@@ -7832,6 +7832,28 @@ int chattr_path(const char *p, bool b, unsigned mask) {
 return chattr_fd(fd, b, mask);
 }
 
+int change_attr_fd(int fd, unsigned value, unsigned mask) {
+unsigned old_attr, new_attr;
+
+assert(fd >= 0);
+
+if (mask == 0)
+return 0;
+
+if (ioctl(fd, FS_IOC_GETFLAGS, &old_attr) < 0)
+return -errno;
+
+new_attr = (old_attr & ~mask) |(value & mask);
+
+if (new_attr == old_attr)
+return 0;
+
+if (ioctl(fd, FS_IOC_SETFLAGS, &new_attr) < 0)
+return -errno;
+
+return 0;
+}
+
 int read_attr_fd(int fd, unsigned *ret) {
 assert(fd >= 0);
 
diff --git a/src/shared/util.h b/src/shared/util.h
index a83b588..a0bc883 100644
--- a/src/shared/util.h
+++ b/src/shared/util.h
@@ -1052,6 +1052,7 @@ int same_fd(int a, int b);
 
 int chattr_fd(int fd, bool b, unsigned mask);
 int chattr_path(const char *p, bool b, unsigned mask);
+int change_attr_fd(int fd, unsigned value, unsigned mask);
 
 int read_attr_fd(int fd, unsigned *ret);
 int read_attr_path(const char *p, unsigned *ret);
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH V3] Allow systemd-tmpfiles to set file/directory attributes

2015-03-10 Thread Goffredo Baroncelli

Hi all,
This set of patches add two new line types to the tmpfiles files format.
These new types of line are 'h' and 'H' (the recursively version), and 
allow to change the file/directory attributes, like chattr(1) does.

One of the motivation of these patches is to get rid of the commit
11689d2a which force the NOCOW flag for the journal files. This was 
needed because systemd-journald has very poor performance when the
filesytem is BTRFS due to its the COW behavior. My concern is that 
the NOCOW flag also prevent BTRFS to rebuild a corrupted file from a 
good copy if it is available. 

With this patch, now the NOCOW flag can be set by systemd-tmpfiles.
See [1] for further information.

BR
G.Baroncelli

Changelog:
v1: first issue
v2: accepted several suggestion on the style; added function 
change_attr_fd(); used the _cleanup_close_; returned
negative errno;
v3: swapped arguments of change_attr_fd() in path_set_attrib()


[1] Re: [systemd-devel] [RFC][PATCH] Add option to enable COW for journal file
https://www.mail-archive.com/systemd-devel@lists.freedesktop.org/msg28724.html

--
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 3/4] Update the man page of tmpfiles.d(5), to document the new h/H command.

2015-03-12 Thread Goffredo Baroncelli
Update the man page of tmpfiles.d(5), to document the new h/H command.
---
 man/tmpfiles.d.xml | 32 
 1 file changed, 32 insertions(+)

diff --git a/man/tmpfiles.d.xml b/man/tmpfiles.d.xml
index 8815bf9..469deeb 100644
--- a/man/tmpfiles.d.xml
+++ b/man/tmpfiles.d.xml
@@ -303,6 +303,37 @@
 
 
 
+  h
+  Set file/directory attributes. Lines of this type
+  accept shell-style globs in place of normal path names.
+
+  The format of the argument field is 
[+-=][aAcCdDeijsStTu]
+  
+
+  The prefix + causes the
+  attribute(s) to be added; - causes the
+  attribute(s) to be removed; =
+  causes the attributes to set exactly as the following letters.
+  The letters 'aAcCdDeijsStTu' select the new
+  attributes for the files, see
+  chattr
+  1 for further information.
+  
+  Passing only = as argument,
+  reset all the file attributes.
+
+  
+
+
+
+  H
+  Recursively set file/directory attributes. Lines
+  of this type accept shell-style globs in place of normal
+  path names.
+  
+
+
+
   a
   a+
   Set POSIX ACLs (access control lists). If
@@ -529,6 +560,7 @@
   setfattr1,
   setfacl1,
   getfacl1
+  chattr1
 
   
 
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 2/4] Allow systemd-tmpfiles to set the file/directory attributes

2015-03-12 Thread Goffredo Baroncelli
Allow systemd-tmpfiles to set the file/directory attributes, like chattr(1)
does. Two more commands are added: 'H' and 'h' to set the attributes,
recursively and not.
---
 src/tmpfiles/tmpfiles.c | 140 
 1 file changed, 140 insertions(+)

diff --git a/src/tmpfiles/tmpfiles.c b/src/tmpfiles/tmpfiles.c
index c948d4d..31faeb6 100644
--- a/src/tmpfiles/tmpfiles.c
+++ b/src/tmpfiles/tmpfiles.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "log.h"
 #include "util.h"
@@ -90,6 +91,8 @@ typedef enum ItemType {
 ADJUST_MODE = 'm', /* legacy, 'z' is identical to this */
 RELABEL_PATH = 'z',
 RECURSIVE_RELABEL_PATH = 'Z',
+SET_ATTRIB = 'h',
+RECURSIVE_SET_ATTRIB = 'H',
 } ItemType;
 
 typedef struct Item {
@@ -108,12 +111,15 @@ typedef struct Item {
 usec_t age;
 
 dev_t major_minor;
+int attrib_value;
+int attrib_mask;
 
 bool uid_set:1;
 bool gid_set:1;
 bool mode_set:1;
 bool age_set:1;
 bool mask_perms:1;
+bool attrib_set:1;
 
 bool keep_first_level:1;
 
@@ -762,6 +768,115 @@ static int path_set_acls(Item *item, const char *path) {
 return 0;
 }
 
+static int get_attrib_from_arg(Item *item) {
+static const struct { int value; char ch; } attributes[] = {
+{ FS_NOATIME_FL, 'A' },   /* do not update atime */
+{ FS_SYNC_FL, 'S' },  /* Synchronous updates */
+{ FS_DIRSYNC_FL, 'D' },   /* dirsync behaviour (directories 
only) */
+{ FS_APPEND_FL, 'a' },/* writes to file may only append */
+{ FS_COMPR_FL, 'c' }, /* Compress file */
+{ FS_NODUMP_FL, 'd' },/* do not dump file */
+{ FS_EXTENT_FL, 'e'}, /* Top of directory hierarchies*/
+{ FS_IMMUTABLE_FL, 'i' }, /* Immutable file */
+{ FS_JOURNAL_DATA_FL, 'j' }, /* Reserved for ext3 */
+{ FS_SECRM_FL, 's' }, /* Secure deletion */
+{ FS_UNRM_FL, 'u' },  /* Undelete */
+{ FS_NOTAIL_FL, 't' },/* file tail should not be merged */
+{ FS_TOPDIR_FL, 'T' },/* Top of directory hierarchies*/
+{ FS_NOCOW_FL, 'C' }, /* Do not cow file */
+{ 0, 0 }
+};
+char *p = item->argument;
+enum {
+MODE_ADD,
+MODE_DEL,
+MODE_SET
+} mode = MODE_ADD;
+unsigned long value = 0, mask = 0;
+
+if (!p) {
+log_error("\"%s\": setting ATTR need an argument", item->path);
+return -EINVAL;
+}
+
+if (*p == '+') {
+mode = MODE_ADD;
+p++;
+} else if (*p == '-') {
+mode = MODE_DEL;
+p++;
+} else  if (*p == '=') {
+mode = MODE_SET;
+p++;
+}
+
+if (!*p && mode != MODE_SET) {
+log_error("\"%s\": setting ATTR: argument is empty", 
item->path);
+return -EINVAL;
+}
+for (; *p ; p++) {
+int i;
+for (i = 0; attributes[i].ch && attributes[i].ch != *p; i++)
+;
+if (!attributes[i].ch) {
+log_error("\"%s\": setting ATTR: unknown attr '%c'", 
item->path, *p);
+return -EINVAL;
+}
+if (mode == MODE_ADD || mode == MODE_SET)
+value |= attributes[i].value;
+else
+value &= ~attributes[i].value;
+mask |= attributes[i].value;
+}
+
+if (mode == MODE_SET) {
+int i;
+for (i = 0; attributes[i].ch; i++)
+mask |= attributes[i].value;
+}
+
+assert(mask);
+
+item->attrib_mask = mask;
+item->attrib_value = value;
+item->attrib_set = true;
+
+return 0;
+
+}
+
+static int path_set_attrib(Item *item, const char *path) {
+_cleanup_close_ int fd = -1;
+int r;
+unsigned f;
+struct stat st;
+
+/* do nothing */
+if (item->attrib_mask == 0 || !item->attrib_set)
+return 0;
+
+if (lstat(path, &st) == 0 &&
+!S_ISREG(st.st_mode) && !S_ISDIR(st.st_mode)) {
+return 0;
+}
+
+fd = open(path, O_RDONLY|O_NONBLOCK|O_CLOEXEC);
+
+if (fd < 0)
+return log_error_errno(errno, "Cannot open \"%s\": %m", path);
+
+f = item->attrib_value & item->attrib_mask;
+if (!S_ISDIR(st.st_mode))
+f &= ~FS_DIRSYNC_FL;
+r = change_attr_fd(fd, item->attrib_mask, f);
+if (r < 0)
+return log_error_errno(errno,
+"Can

[systemd-devel] [PATCH 4/4] Add a new tmpfiles.d snippets to set the NOCOW the journal.

2015-03-12 Thread Goffredo Baroncelli
Add a new tmpfiles.d snippets to set the NOCOW attributes for the
journal files. This allow better perfomance when the root file system
is BTRFS. Pay attention that the NOCOW flags disables the checksum and
prevent scrub to rebuild a corrupted journal.
---
 tmpfiles.d/journal-nocow.conf | 12 
 1 file changed, 12 insertions(+)
 create mode 100644 tmpfiles.d/journal-nocow.conf

diff --git a/tmpfiles.d/journal-nocow.conf b/tmpfiles.d/journal-nocow.conf
new file mode 100644
index 000..43a4f2b
--- /dev/null
+++ b/tmpfiles.d/journal-nocow.conf
@@ -0,0 +1,12 @@
+#  This file is part of systemd.
+#
+#  systemd is free software; you can redistribute it and/or modify it
+#  under the terms of the GNU Lesser General Public License as published by
+#  the Free Software Foundation; either version 2.1 of the License, or
+#  (at your option) any later version.
+
+# See tmpfiles.d(5) for details
+
+
+# set the journal file as NOCOW; only valid for BTRFS filesystem
+h /var/log/journal/%m - - - - +C
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 2/4] Allow systemd-tmpfiles to set the file/directory attributes

2015-03-16 Thread Goffredo Baroncelli
On 2015-03-16 04:12, Zbigniew Jędrzejewski-Szmek wrote:
> On Tue, Mar 10, 2015 at 09:07:41PM +0100, Goffredo Baroncelli wrote:
>> Allow systemd-tmpfiles to set the file/directory attributes, like chattr(1)
>> does. Two more commands are added: 'H' and 'h' to set the attributes,
>> recursively and not.
>> ---
>>  src/tmpfiles/tmpfiles.c | 140 
>> 
>>  1 file changed, 140 insertions(+)
>>
>> diff --git a/src/tmpfiles/tmpfiles.c b/src/tmpfiles/tmpfiles.c
>> index c948d4d..165605f 100644
>> --- a/src/tmpfiles/tmpfiles.c
>> +++ b/src/tmpfiles/tmpfiles.c
>> @@ -40,6 +40,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #include "log.h"
>>  #include "util.h"
>> @@ -90,6 +91,8 @@ typedef enum ItemType {
>>  ADJUST_MODE = 'm', /* legacy, 'z' is identical to this */
>>  RELABEL_PATH = 'z',
>>  RECURSIVE_RELABEL_PATH = 'Z',
>> +SET_ATTRIB = 'h',
>> +RECURSIVE_SET_ATTRIB = 'H',
>>  } ItemType;
>>  
>>  typedef struct Item {
>> @@ -108,12 +111,15 @@ typedef struct Item {
>>  usec_t age;
>>  
>>  dev_t major_minor;
>> +int attrib_value;
>> +int attrib_mask;
>>  
>>  bool uid_set:1;
>>  bool gid_set:1;
>>  bool mode_set:1;
>>  bool age_set:1;
>>  bool mask_perms:1;
>> +bool attrib_set:1;
>>  
>>  bool keep_first_level:1;
>>  
>> @@ -762,6 +768,115 @@ static int path_set_acls(Item *item, const char *path) 
>> {
>>  return 0;
>>  }
>>  
>> +static int get_attrib_from_arg(Item *item) {
>> +static const struct { int value; char ch; } attributes[] = {
>> +{ FS_NOATIME_FL, 'A' },   /* do not update atime */
>> +{ FS_SYNC_FL, 'S' },  /* Synchronous updates */
>> +{ FS_DIRSYNC_FL, 'D' },   /* dirsync behaviour (directories 
>> only) */
>> +{ FS_APPEND_FL, 'a' },/* writes to file may only append 
>> */
>> +{ FS_COMPR_FL, 'c' }, /* Compress file */
>> +{ FS_NODUMP_FL, 'd' },/* do not dump file */
>> +{ FS_EXTENT_FL, 'e'}, /* Top of directory hierarchies*/
>> +{ FS_IMMUTABLE_FL, 'i' }, /* Immutable file */
>> +{ FS_JOURNAL_DATA_FL, 'j' }, /* Reserved for ext3 */
>> +{ FS_SECRM_FL, 's' }, /* Secure deletion */
>> +{ FS_UNRM_FL, 'u' },  /* Undelete */
>> +{ FS_NOTAIL_FL, 't' },/* file tail should not be merged 
>> */
>> +{ FS_TOPDIR_FL, 'T' },/* Top of directory hierarchies*/
>> +{ FS_NOCOW_FL, 'C' }, /* Do not cow file */
>> +{ 0, 0 }
>> +};
>> +char *p = item->argument;
>> +enum {
>> +MODE_ADD,
>> +MODE_DEL,
>> +MODE_SET
>> +} mode = MODE_ADD;
>> +unsigned long value = 0, mask = 0;
>> +
>> +if (!p) {
>> +log_error("\"%s\": setting ATTR need an argument", 
>> item->path);
>> +return -EINVAL;
>> +}
>> +
>> +if (*p == '+') {
>> +mode = MODE_ADD;
>> +p++;
>> +} else if (*p == '-') {
>> +mode = MODE_DEL;
>> +p++;
>> +} else  if (*p == '=') {
>> +mode = MODE_SET;
>> +p++;
>> +}
>> +
>> +if (!*p && mode != MODE_SET) {
>> +log_error("\"%s\": setting ATTR: argument is empty", 
>> item->path);
>> +return -EINVAL;
>> +}
>> +for (; *p ; p++) {
>> +int i;
>> +for (i = 0; attributes[i].ch && attributes[i].ch != *p; i++)
>> +;
> Why not order the table the other way:
> 
> static const int attributes[] = {
>[(uint8_t)'A'] = FS_NOATIME_FL,  /* do not update atime */
>   

Re: [systemd-devel] [PATCH 3/4] Update the man page of tmpfiles.d(5), to document the new h/H command.

2015-03-16 Thread Goffredo Baroncelli
On 2015-03-16 04:24, Zbigniew Jędrzejewski-Szmek wrote:
> On Tue, Mar 10, 2015 at 09:07:42PM +0100, Goffredo Baroncelli wrote:
>> Update the man page of tmpfiles.d(5), to document the new h/H command.
>> ---
>>  man/tmpfiles.d.xml | 32 
>>  1 file changed, 32 insertions(+)
>>
>> diff --git a/man/tmpfiles.d.xml b/man/tmpfiles.d.xml
>> index 8815bf9..469deeb 100644
>> --- a/man/tmpfiles.d.xml
>> +++ b/man/tmpfiles.d.xml
>> @@ -303,6 +303,37 @@
>>  
>>  
>>  
>> +  h
>> +  Set file/directory attributes. Lines of this type
>> +  accept shell-style globs in place of normal path names.
>> +
>> +  The format of the argument field is 
>> [+-=][aAcCdDeijsStTu]
>> +  
>> +
>> +  The prefix + causes the
>> +  attribute(s) to be added; - causes the
>> +  attribute(s) to be removed; =
>> +  causes the attributes to set exactly as the following 
>> letters.
> What happens if neither of the three prefix lettes is used? This
> should be documented.
ok
> 
>> +  The letters 'aAcCdDeijsStTu' select the new
>  instead of ''.
ok
> 
>> +  attributes for the files, see
>> +  chattr
>> +  1 for further information.
>> +  
>> +  Passing only = as argument,
>> +  reset all the file attributes.
> resets
> 
> So, is this description accurate? Operations on the attributes are
> explicitly limited to the ones corresponding to the letters above (by
> using a mask). But files can have other attributes, and the kernel might
> define new attributes as some point. So maybe add a sentence like
> "When operating on attributes, system-tmpfiles limits itself to the
> attributes corresponding to the letters listed above. All other attributes
> will be left untouched, even with =."
> 
> Zbyszek

You are right, good catch !

> 
>> +
>> +  
>> +
>> +
>> +
>> +  H
>> +  Recursively set file/directory attributes. Lines
>> +  of this type accept shell-style globs in place of normal
>> +  path names.
>> +  
>> +
>> +
>> +
>>a
>>a+
>>Set POSIX ACLs (access control lists). If
>> @@ -529,6 +560,7 @@
>>> project='man-pages'>setfattr1,
>>> project='man-pages'>setfacl1,
>>> project='man-pages'>getfacl1
>> +  > project='man-pages'>chattr1
>>  
>>
>>  
>> -- 
>> 2.1.4
>>
>> ___
>> systemd-devel mailing list
>> systemd-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/systemd-devel
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 3/4] Update the man page of tmpfiles.d(5), to document the new h/H command.

2015-03-16 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Update the man page of tmpfiles.d(5), to document the new h/H command.
---
 man/tmpfiles.d.xml | 36 
 1 file changed, 36 insertions(+)

diff --git a/man/tmpfiles.d.xml b/man/tmpfiles.d.xml
index 8815bf9..a532f91 100644
--- a/man/tmpfiles.d.xml
+++ b/man/tmpfiles.d.xml
@@ -303,6 +303,41 @@
 
 
 
+  h
+  Set file/directory attributes. Lines of this type
+  accept shell-style globs in place of normal path names.
+
+  The format of the argument field is 
[+-=][aAcCdDeijsStTu]
+  
+
+  The prefix + (the default one) causes the
+  attribute(s) to be added; - causes the
+  attribute(s) to be removed; =
+  causes the attributes to set exactly as the following letters.
+  The letters aAcCdDeijsStTu select the new
+  attributes for the files, see
+  chattr
+  1 for further information.
+  
+  Passing only = as argument,
+  resets all the file attributes listed above. It has to be pointed
+  out that the = prefix, limits itself to the
+  attributes corresponding to the letters listed here. All other
+  attributes will be left untouched.
+  
+
+  
+
+
+
+  H
+  Recursively set file/directory attributes. Lines
+  of this type accept shell-style globs in place of normal
+  path names.
+  
+
+
+
   a
   a+
   Set POSIX ACLs (access control lists). If
@@ -529,6 +564,7 @@
   setfattr1,
   setfacl1,
   getfacl1
+  chattr1
 
   
 
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 2/4] Allow systemd-tmpfiles to set the file/directory attributes

2015-03-16 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Allow systemd-tmpfiles to set the file/directory attributes, like chattr(1)
does. Two more commands are added: 'H' and 'h' to set the attributes,
recursively and not.
---
 src/tmpfiles/tmpfiles.c | 140 
 1 file changed, 140 insertions(+)

diff --git a/src/tmpfiles/tmpfiles.c b/src/tmpfiles/tmpfiles.c
index c948d4d..e1a22f5 100644
--- a/src/tmpfiles/tmpfiles.c
+++ b/src/tmpfiles/tmpfiles.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "log.h"
 #include "util.h"
@@ -90,6 +91,8 @@ typedef enum ItemType {
 ADJUST_MODE = 'm', /* legacy, 'z' is identical to this */
 RELABEL_PATH = 'z',
 RECURSIVE_RELABEL_PATH = 'Z',
+SET_ATTRIB = 'h',
+RECURSIVE_SET_ATTRIB = 'H',
 } ItemType;
 
 typedef struct Item {
@@ -108,12 +111,15 @@ typedef struct Item {
 usec_t age;
 
 dev_t major_minor;
+int attrib_value;
+int attrib_mask;
 
 bool uid_set:1;
 bool gid_set:1;
 bool mode_set:1;
 bool age_set:1;
 bool mask_perms:1;
+bool attrib_set:1;
 
 bool keep_first_level:1;
 
@@ -762,6 +768,115 @@ static int path_set_acls(Item *item, const char *path) {
 return 0;
 }
 
+static int get_attrib_from_arg(Item *item) {
+static const unsigned attributes[] = {
+[(uint8_t)'A'] = FS_NOATIME_FL,/* do not update atime */
+[(uint8_t)'S'] = FS_SYNC_FL,  /* Synchronous updates */
+[(uint8_t)'D'] = FS_DIRSYNC_FL,   /* dirsync behaviour 
(directories only) */
+[(uint8_t)'a'] = FS_APPEND_FL,/* writes to file may only 
append */
+[(uint8_t)'c'] = FS_COMPR_FL, /* Compress file */
+[(uint8_t)'d'] = FS_NODUMP_FL,/* do not dump file */
+[(uint8_t)'e'] = FS_EXTENT_FL,/* Top of directory 
hierarchies*/
+[(uint8_t)'i'] = FS_IMMUTABLE_FL, /* Immutable file */
+[(uint8_t)'j'] = FS_JOURNAL_DATA_FL, /* Reserved for ext3 */
+[(uint8_t)'s'] = FS_SECRM_FL, /* Secure deletion */
+[(uint8_t)'u'] = FS_UNRM_FL,  /* Undelete */
+[(uint8_t)'t'] = FS_NOTAIL_FL,/* file tail should not be 
merged */
+[(uint8_t)'T'] = FS_TOPDIR_FL,/* Top of directory 
hierarchies*/
+[(uint8_t)'C'] = FS_NOCOW_FL, /* Do not cow file */
+};
+const unsigned all_flags = FS_NOATIME_FL|FS_SYNC_FL|FS_DIRSYNC_FL|
+FS_APPEND_FL|FS_COMPR_FL|FS_NODUMP_FL|FS_EXTENT_FL|
+FS_IMMUTABLE_FL|FS_JOURNAL_DATA_FL|FS_SECRM_FL|FS_UNRM_FL|
+FS_NOTAIL_FL|FS_TOPDIR_FL|FS_NOCOW_FL;
+char *p = item->argument;
+enum {
+MODE_ADD,
+MODE_DEL,
+MODE_SET
+} mode = MODE_ADD;
+unsigned long value = 0, mask = 0;
+
+if (!p) {
+log_error("\"%s\": setting ATTR need an argument", item->path);
+return -EINVAL;
+}
+
+if (*p == '+') {
+mode = MODE_ADD;
+p++;
+} else if (*p == '-') {
+mode = MODE_DEL;
+p++;
+} else  if (*p == '=') {
+mode = MODE_SET;
+p++;
+}
+
+if (!*p && mode != MODE_SET) {
+log_error("\"%s\": setting ATTR: argument is empty", 
item->path);
+return -EINVAL;
+}
+for (; *p ; p++) {
+if ((uint8_t)*p > ELEMENTSOF(attributes) || 
attributes[(uint8_t)*p] == 0) {
+log_error("\"%s\": setting ATTR: unknown attr '%c'", 
item->path, *p);
+return -EINVAL;
+}
+if (mode == MODE_ADD || mode == MODE_SET)
+value |= attributes[(uint8_t)*p];
+else
+value &= ~attributes[(uint8_t)*p];
+mask |= attributes[(uint8_t)*p];
+}
+
+if (mode == MODE_SET)
+mask |= all_flags;
+
+assert(mask);
+
+item->attrib_mask = mask;
+item->attrib_value = value;
+item->attrib_set = true;
+
+return 0;
+
+}
+
+static int path_set_attrib(Item *item, const char *path) {
+_cleanup_close_ int fd = -1;
+int r;
+unsigned f;
+struct stat st;
+
+/* do nothing */
+if (item->attrib_mask == 0 || !item->attrib_set)
+  

[systemd-devel] [PATCH 1/4] Add change_attr_fd()

2015-03-16 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Add change_attr_fd() function to modify the file/directory attribute.
---
 src/shared/util.c | 22 ++
 src/shared/util.h |  1 +
 2 files changed, 23 insertions(+)

diff --git a/src/shared/util.c b/src/shared/util.c
index ba035ca..56097ec 100644
--- a/src/shared/util.c
+++ b/src/shared/util.c
@@ -7832,6 +7832,28 @@ int chattr_path(const char *p, bool b, unsigned mask) {
 return chattr_fd(fd, b, mask);
 }
 
+int change_attr_fd(int fd, unsigned value, unsigned mask) {
+unsigned old_attr, new_attr;
+
+assert(fd >= 0);
+
+if (mask == 0)
+return 0;
+
+if (ioctl(fd, FS_IOC_GETFLAGS, &old_attr) < 0)
+return -errno;
+
+new_attr = (old_attr & ~mask) |(value & mask);
+
+if (new_attr == old_attr)
+return 0;
+
+if (ioctl(fd, FS_IOC_SETFLAGS, &new_attr) < 0)
+return -errno;
+
+return 0;
+}
+
 int read_attr_fd(int fd, unsigned *ret) {
 assert(fd >= 0);
 
diff --git a/src/shared/util.h b/src/shared/util.h
index a83b588..a0bc883 100644
--- a/src/shared/util.h
+++ b/src/shared/util.h
@@ -1052,6 +1052,7 @@ int same_fd(int a, int b);
 
 int chattr_fd(int fd, bool b, unsigned mask);
 int chattr_path(const char *p, bool b, unsigned mask);
+int change_attr_fd(int fd, unsigned value, unsigned mask);
 
 int read_attr_fd(int fd, unsigned *ret);
 int read_attr_path(const char *p, unsigned *ret);
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 4/4] Add a new tmpfiles.d snippets to set the NOCOW the journal.

2015-03-16 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Add a new tmpfiles.d snippets to set the NOCOW attributes for the
journal files. This allow better perfomance when the root file system
is BTRFS. Pay attention that the NOCOW flags disables the checksum and
prevent scrub to rebuild a corrupted journal.
---
 tmpfiles.d/journal-nocow.conf | 12 
 1 file changed, 12 insertions(+)
 create mode 100644 tmpfiles.d/journal-nocow.conf

diff --git a/tmpfiles.d/journal-nocow.conf b/tmpfiles.d/journal-nocow.conf
new file mode 100644
index 000..43a4f2b
--- /dev/null
+++ b/tmpfiles.d/journal-nocow.conf
@@ -0,0 +1,12 @@
+#  This file is part of systemd.
+#
+#  systemd is free software; you can redistribute it and/or modify it
+#  under the terms of the GNU Lesser General Public License as published by
+#  the Free Software Foundation; either version 2.1 of the License, or
+#  (at your option) any later version.
+
+# See tmpfiles.d(5) for details
+
+
+# set the journal file as NOCOW; only valid for BTRFS filesystem
+h /var/log/journal/%m - - - - +C
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH V4] Allow systemd-tmpfiles to set file/directory attributes

2015-03-16 Thread Goffredo Baroncelli

Hi all,
This set of patches add two new line types to the tmpfiles files format.
These new types of line are 'h' and 'H' (the recursively version), and 
allow to change the file/directory attributes, like chattr(1) does.

One of the motivation of these patches is to get rid of the commit
11689d2a which force the NOCOW flag for the journal files. This was 
needed because systemd-journald has very poor performance when the
filesytem is BTRFS due to its the COW behavior. My concern is that 
the NOCOW flag also prevent BTRFS to rebuild a corrupted file from a 
good copy if it is available. 

With this patch, now the NOCOW flag can be set by systemd-tmpfiles.
See [1] for further information.

BR
G.Baroncelli

Changelog:
v1: first issue
v2: accepted several suggestion on the style; added function 
change_attr_fd(); used the _cleanup_close_; returned
negative errno;
v3: swapped arguments of change_attr_fd() in path_set_attrib()
v4:-changed the tables of attributes in function path_set_acls()
   -added a comment about the reason to ignore an error from lstat()
   -improved the man page of tmpfiles.d.5 highlighting that systemd-tmpfiles
updates only the listed attributes in the man page



[1] Re: [systemd-devel] [RFC][PATCH] Add option to enable COW for journal file
https://www.mail-archive.com/systemd-devel@lists.freedesktop.org/msg28724.html

--
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 4/4] Add a new tmpfiles.d snippets to set the NOCOW the journal.

2015-03-20 Thread Goffredo Baroncelli
Hi Zbyszek,

On 2015-03-21 14:37, Zbigniew Jędrzejewski-Szmek wrote:
> On Mon, Mar 16, 2015 at 08:33:52PM +0100, Goffredo Baroncelli wrote:
>> From: Goffredo Baroncelli 
>>
>> Add a new tmpfiles.d snippets to set the NOCOW attributes for the
>> journal files. This allow better perfomance when the root file system
>> is BTRFS. Pay attention that the NOCOW flags disables the checksum and
>> prevent scrub to rebuild a corrupted journal.
> I now merged patches 1-3/4, but not this one. Setting/unsetting
> attributes seems to be generally useful, so the rest stands on its
> own. The reason I held back with the last patch is that setting of the
> attributes through tmpfiles should be added together with the removal
> of the same functionality from journald. 

You are right, the patch #4 and the removal of the current code are coupled;
with the patch #1..#3 included, I will re-issue the #4 with another patch which 
reverts the code. And the discussion will restart.



> But there are some details to
> work out.
> 
> Setting +C on /var/log/journal/%m has smaller scope than the code in
> journal-file.c now. For example it does not cover files opened by
> systemd-journal-remote. 

I am not familiar with s*d-journal-remote; from the man page it seems that the 
log are stored /by default) in /var/log/journal/remote/ ; if so it is 
sufficient to add a line like

+h /var/log/journal/remote - - - - +C



> Having the files no-cow is just as useful for those,
> but it's not entirely clear how to support it in the new scheme.
> 
> Zbyszek
> 
>> +# set the journal file as NOCOW; only valid for BTRFS filesystem
>> +h /var/log/journal/%m - - - - +C
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd and nested Btrfs subvolumes

2015-03-20 Thread Goffredo Baroncelli
Hi Chris,
On 2015-03-20 02:27, Chris Murphy wrote:
> Short version:
> --
>   Instead of machinectl clone using btrfs snapshots, or even needing
> to store things in a var/lib/machines Btrfs subvolume, does it meet
> the requirements for Btrfs optimization to do this with cp -a
> --reflink instead?
> 
> Why? Nested subvolumes are confusing. And nested subvolumes are
> excluded from snapshots. Subvolume B inside of Subvolume A, will not
> be snapshot or rolled back, if I snapshot Subvolume A and subsequently
> rollback to the snapshot of A. 

Personally I prefer this kind of behavior. Each subvolumes may have its life 
cycle so a rollback on a volume doesn't means that the other also need it.

In the past [2] I proposed a patch to introduces a recursively snapshot; but it 
was unnoticed :-(

[...]
> Can clone instead use cp -a --reflink instead of using snapshots? Can
> subvolumes be entirely avoided?

[...]

> So back to cp -a --reflink as a work around for both paradigms? Does
> this method of cloning meet the requirements for systemd containers?
> If so, it works on both the RH/Fedora and the openSUSE layouts.
> Meaning the var/lib/machines subvolume isn't needed, just use cp -a
> --reflink on either directories or files, and it's almost as fast as a
> btrfs snapshot.

I think that a "cp --reflink" is slower than a snapshot, because all the 
metadata still have to be copied. Another disadvantage, is that a snapshot and 
its parent could be "easily" [3] compared; the same would be not true if a cp 
--reflink is used. 

> 
> 
> 
> [1]
> http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg42333.html
> 
> 


[2] http://comments.gmane.org/gmane.comp.file-systems.btrfs/30119
[3] "btrfs find-new" and "btrfs send" do that

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 4/4] Add a new tmpfiles.d snippets to set the NOCOW the journal.

2015-03-21 Thread Goffredo Baroncelli
On 2015-03-22 20:53, Zbigniew Jędrzejewski-Szmek wrote:
> On Fri, Mar 20, 2015 at 07:06:28PM +0100, Goffredo Baroncelli wrote:
>> Hi Zbyszek,
>>
[...]
>>> But there are some details to
>>> work out.
>>>
>>> Setting +C on /var/log/journal/%m has smaller scope than the code in
>>> journal-file.c now. For example it does not cover files opened by
>>> systemd-journal-remote. 
>>
>> I am not familiar with s*d-journal-remote; from the man page it seems that 
>> the log are stored /by default) in /var/log/journal/remote/ ; if so it is 
>> sufficient to add a line like
>>
>> +h /var/log/journal/remote - - - - +C
> 
> That's the problem: current functionality works no matter where you
> store the files, but it's hard to provide the same level of
> flexibility with the tmpfiles-based solution.

Unfortunately the +C attributes has his downside, which cannot be ignored. All 
my work was due to the fact that the NOCOW attribute was set unconditionally. I 
proposed a patches set which adds an option to tune this behavior; Lennart 
asked to replace it with this "tmpfile.d" solution.

Frankly speaking, I think that it would be doable also to set the -C attributes 
by hand by the user/sysadmin[1]. My needing is only to avoid this 
*unconditionally* NOCOW attribute set, for these few cases (BTRFS w/RAID) where 
it could degrades the btrfs features ( without notifying the user !).

> 
> Zbyszek
> 

Goffredo

[1] It is needed only one time; after you set as +C the directory all the new 
files
inherits this attribute.  

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 1/2] Revert patch 11689d2a which force the NOCOW attribute

2015-03-21 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Revert patch 11689d2a, which force the NOCOW attribute for the journal files.
This patch was introduced to allievate the perfomances problem that
journald shows on the BTRFS filesystem.

Because the NOCOW attribute is forced the user can't revert to
the old behavior. However NOCOW attribute disables the btrfs checksums, which
prevent BTRFS to rebuild a currupted file in an RAIDx filesystem.

To continue to set the NOCOW attribute, use the h|H command of systemd-tmpfile.
---
 src/journal/journal-file.c | 28 ++--
 src/journal/journalctl.c   | 12 
 2 files changed, 10 insertions(+), 30 deletions(-)

diff --git a/src/journal/journal-file.c b/src/journal/journal-file.c
index 2845e05..9d56069 100644
--- a/src/journal/journal-file.c
+++ b/src/journal/journal-file.c
@@ -26,7 +26,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "btrfs-util.h"
 #include "journal-def.h"
@@ -142,17 +141,8 @@ void journal_file_close(JournalFile *f) {
 if (f->mmap && f->fd >= 0)
 mmap_cache_close_fd(f->mmap, f->fd);
 
-if (f->fd >= 0 && f->defrag_on_close) {
-
-/* Be friendly to btrfs: turn COW back on again now,
- * and defragment the file. We won't write to the file
- * ever again, hence remove all fragmentation, and
- * reenable all the good bits COW usually provides
- * (such as data checksumming). */
-
-(void) chattr_fd(f->fd, false, FS_NOCOW_FL);
-(void) btrfs_defrag_fd(f->fd);
-}
+if (f->fd >= 0 && f->defrag_on_close)
+btrfs_defrag_fd(f->fd);
 
 safe_close(f->fd);
 free(f->path);
@@ -2602,18 +2592,6 @@ int journal_file_open(
 goto fail;
 
 if (f->last_stat.st_size == 0 && f->writable) {
-
-/* Before we write anything, turn off COW logic. Given
- * our write pattern that is quite unfriendly to COW
- * file systems this should greatly improve
- * performance on COW file systems, such as btrfs, at
- * the expense of data integrity features (which
- * shouldn't be too bad, given that we do our own
- * checksumming). */
-r = chattr_fd(f->fd, true, FS_NOCOW_FL);
-if (r < 0)
-log_warning_errno(errno, "Failed to set file 
attributes: %m");
-
 /* Let's attach the creation time to the journal file,
  * so that the vacuuming code knows the age of this
  * file even if the file might end up corrupted one
@@ -2831,8 +2809,6 @@ int journal_file_open_reliably(
 
 /* btrfs doesn't cope well with our write pattern and
  * fragments heavily. Let's defrag all files we rotate */
-
-(void) chattr_path(p, false, FS_NOCOW_FL);
 (void) btrfs_defrag(p);
 
 log_warning("File %s corrupted or uncleanly shut down, renaming and 
replacing.", fname);
diff --git a/src/journal/journalctl.c b/src/journal/journalctl.c
index 55c7786..abb9d0c 100644
--- a/src/journal/journalctl.c
+++ b/src/journal/journalctl.c
@@ -1290,7 +1290,7 @@ static int setup_keys(void) {
 size_t mpk_size, seed_size, state_size, i;
 uint8_t *mpk, *seed, *state;
 ssize_t l;
-int fd = -1, r;
+int fd = -1, r, attr = 0;
 sd_id128_t machine, boot;
 char *p = NULL, *k = NULL;
 struct FSSHeader h;
@@ -1385,9 +1385,13 @@ static int setup_keys(void) {
 
 /* Enable secure remove, exclusion from dump, synchronous
  * writing and in-place updating */
-r = chattr_fd(fd, true, 
FS_SECRM_FL|FS_NODUMP_FL|FS_SYNC_FL|FS_NOCOW_FL);
-if (r < 0)
-log_warning_errno(errno, "Failed to set file attributes: %m");
+if (ioctl(fd, FS_IOC_GETFLAGS, &attr) < 0)
+log_warning_errno(errno, "FS_IOC_GETFLAGS failed: %m");
+
+attr |= FS_SECRM_FL|FS_NODUMP_FL|FS_SYNC_FL|FS_NOCOW_FL;
+
+if (ioctl(fd, FS_IOC_SETFLAGS, &attr) < 0)
+log_warning_errno(errno, "FS_IOC_SETFLAGS failed: %m");
 
 zero(h);
 memcpy(h.signature, "KSHHRHLP", 8);
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 2/2] Add +C attrib to the journal files directories

2015-03-21 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Add +C attrib to the journal files directories. The journal file format
behaves bad on a BTRFS filesystem: the performances decrease during the
time.
To avoid this issue, this tmpfile.d snippet sets the NOCOW attribute to the
journal files directories, so newly created journal files inherit the NCOOW
attribute.

Be aware that the NOCOW attribute disables the BTRFS checksums, prevent BTRFS
to rebuild a corrupted file in a RAIDx filesystem. However the perfomances
increase.
In a single disk filesystem (or a filesystem without redundancy) it is safe
to use the NOCOW flags.
---
 tmpfiles.d/journal-nocow.conf | 19 +++
 1 file changed, 19 insertions(+)
 create mode 100644 tmpfiles.d/journal-nocow.conf

diff --git a/tmpfiles.d/journal-nocow.conf b/tmpfiles.d/journal-nocow.conf
new file mode 100644
index 000..8d9c1e8
--- /dev/null
+++ b/tmpfiles.d/journal-nocow.conf
@@ -0,0 +1,19 @@
+#  This file is part of systemd.
+#
+#  systemd is free software; you can redistribute it and/or modify it
+#  under the terms of the GNU Lesser General Public License as published by
+#  the Free Software Foundation; either version 2.1 of the License, or
+#  (at your option) any later version.
+
+# See tmpfiles.d(5) for details
+
+# set the journal file as NOCOW; make sense only for BTRFS filesystem
+# WARNING: the NOCOW attribute disables the BTRFS checksums, prevent BTRFS
+#  to rebuild a corrupted file in a RAIDx filesystem. It is suggested
+#  to disables these setting in this kind of filesystem.
+#  However in a single disk filesystem (or a filesystem without 
+#  redundancy) it is safe to use the NOCOW flag.
+#  Setting the NOCOW flag the perfomances increase.
+
+h /var/log/journal/%m - - - - +C
+h /var/log/journal/remote - - - - +C
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH] Set the NOCOW flag for the journal via tmpfiles

2015-03-21 Thread Goffredo Baroncelli

Hi all,

these patches set reverts the commit 11689d2 "journald: turn off COW for
journal files on btrfs" which enables *unconditionally* the NOCOW flag for the
journal files. The reason was that the performances of the journal file format
are very bad on btrfs, and decrease during the time. Disabling the COW
behavior, the perfomances increase.
Unfortunately disabling the COW behavior leads to disable the BTRFS checksums,
which in turn prevents BTRFS to rebuild a corrupted file in a RAID
filesystem [3].

To avoid that I proposed a patch which introduces a configurable option to
disables the "turn off COW" behavior[1]. Lennart commented that instead he
prefer to set the NOCOW attribute via tmpfile.d snippets.

A further patches set was proposed and accepted [2] to extend systemd-tmpfiles
to change the file attributes.

This last patches set removes the "turn off COW" behavior (patch #1) and
introduces a new tmpfiles.d snippet which enable the NOCOW beahvior for the
journal files (patch #2). So a sysadmin can disable this setting overriding
this file configuration.

BR
G.Baroncelli


[1] Re: [systemd-devel] [RFC][PATCH] Add option to enable COW for journal file
https://www.mail-archive.com/systemd-devel@lists.freedesktop.org/msg28724.html

[2] 
https://www.mail-archive.com/systemd-devel@lists.freedesktop.org/msg29313.html

[3] http://en.wikipedia.org/wiki/Btrfs#Checksum_tree_and_scrubbing

--
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] Set the NOCOW flag for the journal via tmpfiles

2015-03-29 Thread Goffredo Baroncelli
Hi,
did you find the time to give a look a these patches ? Do you have any 
suggestions ?

BR
G.Baroncelli

On 2015-03-21 12:56, Goffredo Baroncelli wrote:
> 
> Hi all,
> 
> these patches set reverts the commit 11689d2 "journald: turn off COW for
> journal files on btrfs" which enables *unconditionally* the NOCOW flag for the
> journal files. The reason was that the performances of the journal file format
> are very bad on btrfs, and decrease during the time. Disabling the COW
> behavior, the perfomances increase.
> Unfortunately disabling the COW behavior leads to disable the BTRFS checksums,
> which in turn prevents BTRFS to rebuild a corrupted file in a RAID
> filesystem [3].
> 
> To avoid that I proposed a patch which introduces a configurable option to
> disables the "turn off COW" behavior[1]. Lennart commented that instead he
> prefer to set the NOCOW attribute via tmpfile.d snippets.
> 
> A further patches set was proposed and accepted [2] to extend systemd-tmpfiles
> to change the file attributes.
> 
> This last patches set removes the "turn off COW" behavior (patch #1) and
> introduces a new tmpfiles.d snippet which enable the NOCOW beahvior for the
> journal files (patch #2). So a sysadmin can disable this setting overriding
> this file configuration.
> 
> BR
> G.Baroncelli
> 
> 
> [1] Re: [systemd-devel] [RFC][PATCH] Add option to enable COW for journal file
> https://www.mail-archive.com/systemd-devel@lists.freedesktop.org/msg28724.html
> 
> [2] 
> https://www.mail-archive.com/systemd-devel@lists.freedesktop.org/msg29313.html
> 
> [3] http://en.wikipedia.org/wiki/Btrfs#Checksum_tree_and_scrubbing
> 
> --
> gpg @keyserver.linux.it: Goffredo Baroncelli 
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
> ___
> systemd-devel mailing list
> systemd-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/systemd-devel
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 2/4] Allow systemd-tmpfiles to set the file/directory attributes

2015-04-08 Thread Goffredo Baroncelli
On 2015-04-08 20:36, Lennart Poettering wrote:
> On Mon, 16.03.15 20:33, Goffredo Baroncelli (kreij...@libero.it) wrote:
> 
> Sorry for warming up this old thread, but a few issues I see with this?
> 
>>  RECURSIVE_RELABEL_PATH = 'Z',
>> +SET_ATTRIB = 'h',
>> +RECURSIVE_SET_ATTRIB = 'H',
>>  } ItemType;
>>  
>>  typedef struct Item {
>> @@ -108,12 +111,15 @@ typedef struct Item {
>>  usec_t age;
>>  
>>  dev_t major_minor;
>> +int attrib_value;
>> +int attrib_mask;
> 
> I thought this was unsigned now?

Sew below



> 
>>  
>> +static int get_attrib_from_arg(Item *item) {
>> +static const unsigned attributes[] = {
>> +[(uint8_t)'A'] = FS_NOATIME_FL,/* do not update atime */
>> +[(uint8_t)'S'] = FS_SYNC_FL,  /* Synchronous updates */
>> +[(uint8_t)'D'] = FS_DIRSYNC_FL,   /* dirsync behaviour 
>> (directories only) */
>> +[(uint8_t)'a'] = FS_APPEND_FL,/* writes to file may 
>> only append */
>> +[(uint8_t)'c'] = FS_COMPR_FL, /* Compress file */
>> +[(uint8_t)'d'] = FS_NODUMP_FL,/* do not dump file */
>> +[(uint8_t)'e'] = FS_EXTENT_FL,/* Top of directory 
>> hierarchies*/
>> +[(uint8_t)'i'] = FS_IMMUTABLE_FL, /* Immutable file */
>> +[(uint8_t)'j'] = FS_JOURNAL_DATA_FL, /* Reserved for ext3 */
>> +[(uint8_t)'s'] = FS_SECRM_FL, /* Secure deletion */
>> +[(uint8_t)'u'] = FS_UNRM_FL,  /* Undelete */
>> +[(uint8_t)'t'] = FS_NOTAIL_FL,/* file tail should not 
>> be merged */
>> +[(uint8_t)'T'] = FS_TOPDIR_FL,/* Top of directory 
>> hierarchies*/
>> +[(uint8_t)'C'] = FS_NOCOW_FL, /* Do not cow file */
>> +};
> 
> Hmm, that's quite a sparse array! It wastes 116 entries, just to
> store 13 entries... I think this should really be an array fo structs
> with the chars and their flag, which we iterate through... 

I accepted a suggestion of Zbyszek; this solution avoids to loop over the 
list... 
> 
>> +const unsigned all_flags = FS_NOATIME_FL|FS_SYNC_FL|FS_DIRSYNC_FL|
>> +FS_APPEND_FL|FS_COMPR_FL|FS_NODUMP_FL|FS_EXTENT_FL|
>> +FS_IMMUTABLE_FL|FS_JOURNAL_DATA_FL|FS_SECRM_FL|FS_UNRM_FL|
>> +FS_NOTAIL_FL|FS_TOPDIR_FL|FS_NOCOW_FL;
>> +char *p = item->argument;
>> +enum {
>> +MODE_ADD,
>> +MODE_DEL,
>> +MODE_SET
>> +} mode = MODE_ADD;
>> +unsigned long value = 0, mask = 0;
> 
> And now it is "unsigned long", not just "unsigned anymore?

For 32 bit, the ioctl FS_IOC_GETFLAGS seems to be defined with an "int" as 
argument; for 64bit, the argument is a long; however chattr_fd() uses 
unsigned... Yes this is a mess; probably "int" is a good answer

> 
>> +
>> +static int path_set_attrib(Item *item, const char *path) {
>> +_cleanup_close_ int fd = -1;
>> +int r;
>> +unsigned f;
>> +struct stat st;
>> +
>> +/* do nothing */
>> +if (item->attrib_mask == 0 || !item->attrib_set)
>> +return 0;
>> +/*
>> + * It is OK to ignore an lstat() error, because the error
>> + * will be catch by the open() below anyway
>> + */
>> +if (lstat(path, &st) == 0 &&
>> +!S_ISREG(st.st_mode) && !S_ISDIR(st.st_mode)) {
>> +return 0;
>> +}
> 
> Why do we need this check? What happens if we apply this onto
> a device node, or socket, or whatever? 

I copied this check from the source of chattr; the reason is:

(from https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=152029#10)
[...]
>$ lsattr /dev/dri/card0
>s-S--- /dev/dri/card0
>Segmentation fault (core dumped)
>
>$ ls -al /dev/dri/card0
>crw-rw-rw-1 root root 226,   0 jun 30 11:12 /dev/dri/card0

lsattr and chattr can not work against device files, since they use
ioctl's which are supported by the ext2 filesystem, but which aren't
supported by normal devices (and unfortunately normal devices don't
forwawrd the ioctl on to the ext2 filesystem).  In this case, the ioctl
used by ls

Re: [systemd-devel] [PATCH 1/2] Revert patch 11689d2a which force the NOCOW attribute

2015-04-11 Thread Goffredo Baroncelli
Hi Lennart,

On 2015-04-08 23:12, Lennart Poettering wrote:
>> --- a/src/journal/journalctl.c
>> > +++ b/src/journal/journalctl.c
>> > @@ -1290,7 +1290,7 @@ static int setup_keys(void) {
>> >  size_t mpk_size, seed_size, state_size, i;
>> >  uint8_t *mpk, *seed, *state;
>> >  ssize_t l;
>> > -int fd = -1, r;
>> > +int fd = -1, r, attr = 0;
>> >  sd_id128_t machine, boot;
>> >  char *p = NULL, *k = NULL;
>> >  struct FSSHeader h;
>> > @@ -1385,9 +1385,13 @@ static int setup_keys(void) {
>> >  
>> >  /* Enable secure remove, exclusion from dump, synchronous
>> >   * writing and in-place updating */
>> > -r = chattr_fd(fd, true, 
>> > FS_SECRM_FL|FS_NODUMP_FL|FS_SYNC_FL|FS_NOCOW_FL);
>> > -if (r < 0)
>> > -log_warning_errno(errno, "Failed to set file attributes: 
>> > %m");
>> > +if (ioctl(fd, FS_IOC_GETFLAGS, &attr) < 0)
>> > +log_warning_errno(errno, "FS_IOC_GETFLAGS failed: %m");
>> > +
>> > +attr |= FS_SECRM_FL|FS_NODUMP_FL|FS_SYNC_FL|FS_NOCOW_FL;
>> > +
>> > +if (ioctl(fd, FS_IOC_SETFLAGS, &attr) < 0)
>> > +log_warning_errno(errno, "FS_IOC_SETFLAGS failed: %m");
> This is unrelated, and should not be reverted at all.
> 
> Lennart

Ok, this was a my fault to revert this chunk; anyway I would like to know which 
is the purpose of the FS_NOCOW_FL flag here. If I read the code, the file is 
few hundred bytes long, so in BTRFS this would be stored in the metadata chunk, 
and I am not sure if FS_NOCOW_FL is honored at all...


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 4/4] Add a new tmpfiles.d snippets to set the NOCOW the journal.

2015-04-12 Thread Goffredo Baroncelli
On 2015-04-12 15:12, Lennart Poettering wrote:
> On Sat, 11.04.15 17:07, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote:
> 
>>>> That's the problem: current functionality works no matter where you
>>>> store the files, but it's hard to provide the same level of
>>>> flexibility with the tmpfiles-based solution.
>>>
>>> Well, but we never store files outside of /var/log/journal,
>>> /var/log/journal/%m and /var/log/journal/remote/, do we? 
>> We can, if instructed to do so. journal-remote can store files wherever.
>>
>> Original motivation for this patch was to make the NOCOW on journal files
>> configurable without too much fuss and without making it an explicit option.
>> Journal files on btrfs without NOCOW are rather slow, so it seems that most
>> people will want NOCOW on. But with the proposed patch, people would want
>> to add the tmpfile snippet to set NOCOW for every location they write too,
>> which is very visible and requires explicit configuration. Doing this in
>> journal-file directly was simple, synchronous, and worked everywhere, and
>> we are replacing this with a more complicated and more brittle scheme.
>>
>> Dunno, if you think things are better this way, I'm fine with that,
>> since both schemes should get the job done. But in the end, adding a
>> switch in journald.conf seems more in the systemd spirit of doing the right
>> thing automatically and also less work for both sides...
> 
> What about this solution: let's go the tmpfiles way, but also add some
> code to the journal file writer to log at INFO level an actionable
> recommendation to turn on the c flag on the directory if we notice
> that the newly created file doesn't have it, and it is located on
> btrfs?

After the work that I done to the tmpfiles, I have to agree with Zbyszek. 
Adding an option to the journal.conf file is the more reasonable thing to do.

If we add code that performs only a check in the code of journal, I think that 
we have the worst solution:
- journal code is still aware of the NOCOW attribute (== more code complexity)
- the user had to update/manage a tmpfile.d  manually

Let me to suggest the opposite solution:
- add a switch in the journald.conf file
- add a check that raise a warning if the NOCOW flag is not-used/used

GB



> 
> That way, folks who use the tools on non-default directories will at
> least be guided to an explanation of the general problem.
> 
> Lennart
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 4/4] Add a new tmpfiles.d snippets to set the NOCOW the journal.

2015-04-12 Thread Goffredo Baroncelli
On 2015-04-12 18:06, Lennart Poettering wrote:
> On Sun, 12.04.15 17:29, Goffredo Baroncelli (kreij...@inwind.it) wrote:
> 
>> On 2015-04-12 15:12, Lennart Poettering wrote:
>>> On Sat, 11.04.15 17:07, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) 
>>> wrote:
>>>
>>>>>> That's the problem: current functionality works no matter where you
>>>>>> store the files, but it's hard to provide the same level of
>>>>>> flexibility with the tmpfiles-based solution.
>>>>>
>>>>> Well, but we never store files outside of /var/log/journal,
>>>>> /var/log/journal/%m and /var/log/journal/remote/, do we? 
>>>> We can, if instructed to do so. journal-remote can store files wherever.
>>>>
>>>> Original motivation for this patch was to make the NOCOW on journal files
>>>> configurable without too much fuss and without making it an explicit 
>>>> option.
>>>> Journal files on btrfs without NOCOW are rather slow, so it seems that most
>>>> people will want NOCOW on. But with the proposed patch, people would want
>>>> to add the tmpfile snippet to set NOCOW for every location they write too,
>>>> which is very visible and requires explicit configuration. Doing this in
>>>> journal-file directly was simple, synchronous, and worked everywhere, and
>>>> we are replacing this with a more complicated and more brittle scheme.
>>>>
>>>> Dunno, if you think things are better this way, I'm fine with that,
>>>> since both schemes should get the job done. But in the end, adding a
>>>> switch in journald.conf seems more in the systemd spirit of doing the right
>>>> thing automatically and also less work for both sides...
>>>
>>> What about this solution: let's go the tmpfiles way, but also add some
>>> code to the journal file writer to log at INFO level an actionable
>>> recommendation to turn on the c flag on the directory if we notice
>>> that the newly created file doesn't have it, and it is located on
>>> btrfs?
>>
>> After the work that I done to the tmpfiles, I have to agree with
>> Zbyszek. Adding an option to the journal.conf file is the more
>> reasonable thing to do.
> 
> You mean journald.conf I figure? That's not even read by the remoting
> tools, so how is that a solution?

In my first attempt, I added a switch to the command line.

> 
>> If we add code that performs only a check in the code of journal, I think 
>> that we have the worst solution:
>> - journal code is still aware of the NOCOW attribute (== more code 
>> complexity)
>> - the user had to update/manage a tmpfile.d  manually
> 
> Well, again: the nocow thing is a work-around around a design issue
> with btrfs, and btrfs is working on correcting that, by adding
> auto-defrag to deal better with write patterns such as this.
> 
> We should not add new explicit config options for things we already
> know *now* that they are a stopgap, and will go away eventually.
> 
> With the solution I propose (which is tmpfiles snippet + warning if
> non-enabled) we get:
> 
> - default behaviour is fast
> 
> - default behaviour is easy to override
> 
> - specialist users who use the remoting feature and use the thing on a
>   non-default directory, are notified about the issue at hand.
> 
> - we can easily get rid of it eventually, simply by dropping one
>   config line and the generation of a warning. There's no option to
>   deprecate then, and keep compat for.

Even if I agree with you about the points above, I am not fully
convinced about changing the NOCOW attribute via tmpfile snippet:
it seems to me an overkill solution..

But I prefer the snippet solution to the old behavior (set the NOCOW attribute
coded in systemd-journald).

> 
>> Let me to suggest the opposite solution:
>> - add a switch in the journald.conf file
>> - add a check that raise a warning if the NOCOW flag is not-used/used
>>
> 
> This does not fix the remoting issue, since journald.conf isn't read
> by those tools. Also, it adds a setting we'll eventuall have to get
> rid of again. 
> 
> Sorry, but I am really against a solution like that. I don't want to
> litter the config file with options that are hacks and will go away
> one day...
> 
> Lennart
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH V2] Set the NOCOW flag for the journal via the tmpfiles

2015-04-12 Thread Goffredo Baroncelli

Hi all,

these patches set partially reverts the commit 11689d2 "journald: turn off COW
for journal files on btrfs" which enables *unconditionally* the NOCOW flag for
the journal files. The reason was that the performances of the journal file
format are very bad on btrfs, and decrease during the time. Disabling the COW
behavior, the perfomances increase.
Unfortunately disabling the COW behavior leads to disable the BTRFS checksums,
which in turn prevents BTRFS to rebuild a corrupted file in a RAID
filesystem [3].

To avoid that I proposed a patch which introduces a configurable option to
disables the "turn off COW" behavior[1]. Lennart commented that instead he
prefer to set the NOCOW attribute via tmpfile.d snippets.

A further patches set was proposed and accepted [2] to extend systemd-tmpfiles
to change the file attributes.

This last patches set removes the "turn off COW" behavior (patch #1) and
introduces a new tmpfiles.d snippet which enable the NOCOW beahvior for the
journal files (patch #2). So a sysadmin can disable this setting overriding
this file configuration.

BR
G.Baroncelli

Changelog:
v1: first issue
v2: revert only the part related to SETTING the NOCOW attribute of the
commit 11689d2
update the comment in the "tmpfile snippet"


[1] Re: [systemd-devel] [RFC][PATCH] Add option to enable COW for journal file
https://www.mail-archive.com/systemd-devel@lists.freedesktop.org/msg28724.html

[2] 
https://www.mail-archive.com/systemd-devel@lists.freedesktop.org/msg29313.html

[3] http://en.wikipedia.org/wiki/Btrfs#Checksum_tree_and_scrubbing

--
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 1/2] Partially revert patch 11689d2a which force the NOCOW attribute.

2015-04-12 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Partially revert patch 11689d2a, which force the NOCOW attribute for the
journal files. This patch was introduced to allievate the perfomances
problem that journald shows on the BTRFS filesystem.

Because the NOCOW attribute is forced the user can't revert to
the old behavior. However NOCOW attribute disables the btrfs checksums,
which prevent BTRFS to rebuild a currupted file in an RAIDx filesystem.

To continue to set the NOCOW attribute, use the h|H command of
systemd-tmpfile.
---
 src/journal/journal-file.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/src/journal/journal-file.c b/src/journal/journal-file.c
index 2845e05..65e3e38 100644
--- a/src/journal/journal-file.c
+++ b/src/journal/journal-file.c
@@ -2602,18 +2602,6 @@ int journal_file_open(
 goto fail;
 
 if (f->last_stat.st_size == 0 && f->writable) {
-
-/* Before we write anything, turn off COW logic. Given
- * our write pattern that is quite unfriendly to COW
- * file systems this should greatly improve
- * performance on COW file systems, such as btrfs, at
- * the expense of data integrity features (which
- * shouldn't be too bad, given that we do our own
- * checksumming). */
-r = chattr_fd(f->fd, true, FS_NOCOW_FL);
-if (r < 0)
-log_warning_errno(errno, "Failed to set file 
attributes: %m");
-
 /* Let's attach the creation time to the journal file,
  * so that the vacuuming code knows the age of this
  * file even if the file might end up corrupted one
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 2/2] Add +C attrib to the journal files directories

2015-04-12 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Add +C attrib to the journal files directories. The journal file format
behaves bad on a BTRFS filesystem: the performances decrease during the
time.
To avoid this issue, this tmpfile.d snippet sets the NOCOW attribute to the
journal files directories, so newly created journal files inherit the NCOOW
attribute.

Be aware that the NOCOW attribute disables the BTRFS checksums, prevent BTRFS
to rebuild a corrupted file in a RAIDx filesystem. However the perfomances
increase.
In a single disk filesystem (or a filesystem without redundancy) it is safe
to use the NOCOW flags.
---
 tmpfiles.d/journal-nocow.conf | 22 ++
 1 file changed, 22 insertions(+)
 create mode 100644 tmpfiles.d/journal-nocow.conf

diff --git a/tmpfiles.d/journal-nocow.conf b/tmpfiles.d/journal-nocow.conf
new file mode 100644
index 000..493
--- /dev/null
+++ b/tmpfiles.d/journal-nocow.conf
@@ -0,0 +1,22 @@
+#  This file is part of systemd.
+#
+#  systemd is free software; you can redistribute it and/or modify it
+#  under the terms of the GNU Lesser General Public License as published by
+#  the Free Software Foundation; either version 2.1 of the License, or
+#  (at your option) any later version.
+
+# See tmpfiles.d(5) for details
+
+# Set the NOCOW attribute for directories of journal files; this flag is
+# inheredited by their new files and sub-directories; valid only for a BTRFS
+# filesystem
+# WARNING: enabling the NOCOW attribute improves the perfomance, but also
+#disables the BTRFS checksums. In a RAID BTRFS filesystem, the checksums
+#are needed to rebuild a corrupted file; without checksums a rebuild is
+#not possible.
+# In a single-disk filesystem (or a filesystem without redundancy) enabling
+# the NOCOW attribute for the journal files is safe, because these have their
+# own checksums and a rebuilding wouldn't be possible in any case
+
+h /var/log/journal/%m - - - - +C
+h /var/log/journal/remote - - - - +C
-- 
2.1.4

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 2/4] Allow systemd-tmpfiles to set the file/directory attributes

2015-04-12 Thread Goffredo Baroncelli
Hi Lennart,

On 2015-04-08 20:36, Lennart Poettering wrote:
>> +
>> > +static int path_set_attrib(Item *item, const char *path) {
>> > +_cleanup_close_ int fd = -1;
>> > +int r;
>> > +unsigned f;
>> > +struct stat st;
>> > +
>> > +/* do nothing */
>> > +if (item->attrib_mask == 0 || !item->attrib_set)
>> > +return 0;
>> > +/*
>> > + * It is OK to ignore an lstat() error, because the error
>> > + * will be catch by the open() below anyway
>> > + */
>> > +if (lstat(path, &st) == 0 &&
>> > +!S_ISREG(st.st_mode) && !S_ISDIR(st.st_mode)) {
>> > +return 0;
>> > +}
> Why do we need this check? What happens if we apply this onto
> a device node, or socket, or whatever? 
> 
> I really don#t like this seperate lstat() + open(). If it all it
> should be open() + fstat(), to avoid TTOCTOU issues...

I am checking your changes; you modified the code above in :

fd = open(path, O_RDONLY|O_NONBLOCK|O_CLOEXEC);
if (fd < 0)
return log_error_errno(errno, "Cannot open '%s': %m", path);

if (fstat(fd, &st) < 0)
return log_error_errno(errno, "Cannot stat '%s': %m", path);

/* Issuing the file attribute ioctls on device nodes is not
 * safe, as that will be delivered to the drivers, not the
 * file system containing the device node. */
if (!S_ISREG(st.st_mode) && !S_ISDIR(st.st_mode)) {
log_error("Setting file flags is only supported on regular 
files and directories, cannot set on '%s'.", path);
return -EINVAL;
}


However the original code catch also the case where the file is a soft-link.
The same check is performed also by chattr(1); I suggest to leave the original
behavior, changing

fd = open(path, O_RDONLY|O_NONBLOCK|O_CLOEXEC);
in
fd = open(path, O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_NOFOLLOW);

and checking if the errno is ELOOP. In this case a further check is performed 
to 
verify if the file is a link or the error is due to a too many symbolic link.
Then an appropriate message error is printed.

What do you think ?

Goffredo

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 1/2] Partially revert patch 11689d2a which force the NOCOW attribute.

2015-04-23 Thread Goffredo Baroncelli
Hi,

On 2015-04-22 13:33, Lennart Poettering wrote:
> On Sun, 12.04.15 20:30, Goffredo Baroncelli (kreij...@libero.it) wrote:
> 
>> From: Goffredo Baroncelli 
>>
>> Partially revert patch 11689d2a, which force the NOCOW attribute for the
>> journal files. This patch was introduced to allievate the perfomances
>> problem that journald shows on the BTRFS filesystem.
>>
>> Because the NOCOW attribute is forced the user can't revert to
>> the old behavior. However NOCOW attribute disables the btrfs checksums,
>> which prevent BTRFS to rebuild a currupted file in an RAIDx filesystem.
>>
>> To continue to set the NOCOW attribute, use the h|H command of
>> systemd-tmpfile.
> 
> I have now commited a similar patch that replaces the patching of the
> attribute for new journal files with a warning if we notice the
> journal file is on btrfs and the flag is not set.
> 
> I hope this settles the issue!

Many thanks for working on that !
BR
G.Baroncelli


> 
> Thanks!
> 
> Lennart
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADS-UP] Discoverable Partitions Spec

2014-03-10 Thread Goffredo Baroncelli
On 03/07/2014 07:26 PM, Lennart Poettering wrote:
> Heya!
> 
> Since yesterday systemd in git can now discover root, /home, /srv and
> swap partitions automatically based on GPT type GUIDs, thus making
> /etc/fstab unnecessary for simple setups.
> 
> I have now put together something like a spec describing the logic
> behind that, and what it is good for:

> http://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/
> 

Form the FAQ:

[...] What about automatic mounting of btrfs subvolumes to /var, /home and so 
on?

Doing a similar automatic discovery of btrfs subvolumes and mounting them 
automatically to the appropriate places is certainly desirable. We are waiting 
for the btrfs designers to add a per-subvolume type UUID to their disk format 
to make this possible. [...]


Instead of relying on the subvolume UUID, why not relying to the subvolume 
name: it would be more simple and flexible to manage them.

For example supposing to use '@' as prefix for a subvolume name:

@   -> root filesystem
@etc-> etc
@home   -> home
[...]

If you want multiple OS on the same filesystem we can use the following 
convention

@home   -> home of all the systems
@srv-> srv  of all the systems
@fedora_-> root of a fedora system
@fedora_etc -> etc of the fedora system
@fedora2_   -> root of a fedora2 system
@fedora2_etc-> etc of the fedora2 system

Or in another way we could group the different systems in subdirectories:

@home   -> home of all the systems
@srv-> srv  of all the systems
fedora/@-> root of a fedora system
fedora/@etc -> etc of the fedora system
fedora2/@   -> root of a fedora2 system
fedora2/@etc    -> etc of the fedora2 system



-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADS-UP] Discoverable Partitions Spec

2014-03-10 Thread Goffredo Baroncelli
Hi Kay
On 03/10/2014 07:53 PM, Kay Sievers wrote:
[...]
>>
>> Instead of relying on the subvolume UUID, why not relying to the subvolume 
>> name: it would be more simple and flexible to manage them.
> 
> As a general rule: human-readable names should be left to the
> administrator, provide an identifier for humans, and should not be
> overloaded with magic machine behavior.

In general I agree with you. 
But using "a name" you can manage multiple system on the same filesystem. This 
is impossible with the UUID.

> 
> KayGoffredo-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADS-UP] Discoverable Partitions Spec

2014-03-10 Thread Goffredo Baroncelli
On 03/10/2014 09:02 PM, Lennart Poettering wrote:
> On Mon, 10.03.14 19:34, Goffredo Baroncelli (kreij...@libero.it) wrote:
> 
> Heya,
> 
>> Instead of relying on the subvolume UUID, why not relying to the subvolume 
>> name: it would be more simple and flexible to manage them.
>>
>> For example supposing to use '@' as prefix for a subvolume name:
>>
>> @-> root filesystem
>> @etc -> etc
>> @home-> home
>> [...]
> 
> Well, the name is property of the admin really. There needs to be a way
> how the admin can label his subvolumes, with a potentially localized
> name. This makes it unsuitable for our purpose, we cannot just take
> possession of this and leave the admin with nothing.

Instead of the name we can use the xattr to store these information.


> On GPT there are also gpt partition labels and partition types. The
> former are property of the admin, he can place there whatever he wants,
> in whatever language he chooses... The latter however is how we make
> sense of it on a semantical level.
> 
>> Or in another way we could group the different systems in subdirectories:
>>
>> @home-> home of all the systems
>> @srv -> srv  of all the systems
>> fedora/@ -> root of a fedora system
>> fedora/@etc  -> etc of the fedora system
>> fedora2/@-> root of a fedora2 system
>> fedora2/@etc -> etc of the fedora2 system
> 
> I am pretty sure automatic discovery of mount points should not cover
> the usecase where people install multiple distributions into the same
> btrfs volume. THe automatic logic should cover the simple cases only,
> and it sounds way over the top to support installing multiple OSes into
> the same btrfs... I mean, people can do that, if they want to, they just
> have to write a proper fstab, which I think is not too much too ask...

In your specification, you referred the use case of "container" (via nspawn / 
libvrt-lxc). which have to boot "a disk image". Why you don't mind to use a 
container on a btrfs snapshot ? I think that it will be reasonable to have 
different containers on a snapshots of the same filesystem-tree.


> 
> Lennart
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADS-UP] Discoverable Partitions Spec

2014-03-10 Thread Goffredo Baroncelli
On 03/10/2014 09:21 PM, Chris Mason wrote:
> On 03/10/2014 04:02 PM, Lennart Poettering wrote:
>> On Mon, 10.03.14 19:34, Goffredo Baroncelli (kreij...@libero.it) wrote:
[...]
>> I am pretty sure automatic discovery of mount points should not cover
>> the usecase where people install multiple distributions into the same
>> btrfs volume. THe automatic logic should cover the simple cases only,
>> and it sounds way over the top to support installing multiple OSes into
>> the same btrfs... I mean, people can do that, if they want to, they just
>> have to write a proper fstab, which I think is not too much too ask...
> 
> Thinking more about this, using the UUIDs does make it harder for the
> admin to roll back and forth between snapshots. This is similar to
> the multiple install idea, but the goal would be easily jumping back
> to the old one if an update failed.
> 
> I'm not against anything that makes us more flexible here, just
> trying to nail down the use case a little bit more.>

We can store the mount point in a xattr. Also we can store the snapshots 
relation (parent/child or real/rollback) in a xattr. During the boot a 
"systemd-btrfs-fstab-generator" could generate on the fly the right mounts list.



 
> -chris
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADS-UP] Discoverable Partitions Spec

2014-03-12 Thread Goffredo Baroncelli
On 03/12/2014 06:24 PM, Chris Mason wrote:
> 
> 
> On 03/10/2014 07:45 PM, Lennart Poettering wrote:
>> On Mon, 10.03.14 23:39, Goffredo Baroncelli (kreij...@libero.it) wrote:
>>
>>>> Well, the name is property of the admin really. There needs to be a way
>>>> how the admin can label his subvolumes, with a potentially localized
>>>> name. This makes it unsuitable for our purpose, we cannot just take
>>>> possession of this and leave the admin with nothing.
>>>
>>> Instead of the name we can use the xattr to store these information.
>>
>> Ah, using xattrs for this is indeed an option. That way we should be able
>> attach any kind of information we like to a subvolume.
>>
>> Hmm, I figure though that there is no way currently to read xattrs off a
>> subvolume without first mounting them individually? Having to mount all
>> subvolumes before we can make sense of them and mount them to the right
>> place certainly sounds less than ideal...
> 
> Ok, are we hoping to pull the xattrs off the disk before mounting
> anything? Or can we do a mount in a side directory first to scan for
> subvols?
> 
> I like the idea of something like this:
> 
> mount device on /search_for_fstab cd
> /search_for_fstab/
> 
> read xattrs on directories it finds to see where they should be
> mounted in the FS. xattrs may include mount options and special
> flags.

I am working to prototype something like that. A "mount.btrfs" command which
1) handles the rollback (i.e. the user make a snapshot which is a rollback; if 
something goes wrong and the machine reboot before ending the process, during 
the subvolume mounting phase the rollback replaces the "original" subvolume)

2) handles the automount (i.e. if a subvolume has the right xattr it is 
automatically mounted in the right place)

My idea is that the subvolume are grouped so:

@ simple subvolume
@. snapshot of subvolume 
@.rollbackrollback subvolume

If @.rollback exists, then it replace @ (something went wrong, the 
machine rebooted so the rollback have to take place of the original subvolume)

For each @ subvolume the following xattrs are considered:
user.btrfs.automount=1|0the subvolume has to be automounted
user.btrfs.mntpoint=  subvolume mount point


So this "mount.btrsf" command should:

1) mount the root btrfs filesystem (subvolid=5) in a temporary directory
2) performing the auto rollback (this behaviour can be controlled by another 
xattr)
3) mount the subvolume "@" as "root" (like the default one) in the right mount 
point
4) for each subvolume which has user.btrfs.automount=1, it should be mounted 
under the path stored in the "user.btrfs.mntpoint" xattr (relative to "@" or 
absolute)
5) umount the btrfs filesystem mounted at #1, or better move it to a new 
position in order
to allow managing (snapshot) of the different subvolumes.

Thoughts ?

BR
G.Baroncelli

> 
> mount the things you find
> 
> umount /search_for_fstab
> 
> I like that it's not actually btrfs specific, since the bind mounts
> will work for any FS. But it naturally fits the /@ namespace that we
> seem to be favoring as a collection point for snapshots and
> subvolumes. 
> -chris
> 
> 
> 
> 
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADS-UP] Discoverable Partitions Spec

2014-03-12 Thread Goffredo Baroncelli
On 03/12/2014 08:31 PM, Chris Murphy wrote:
> 
> On Mar 12, 2014, at 1:12 PM, Goffredo Baroncelli  wrote:
> 
[...]
>>
>> I am working to prototype something like that. A "mount.btrfs" command which
>> 1) handles the rollback (i.e. the user make a snapshot which is a rollback; 
>> if something goes wrong and the machine reboot before ending the process, 
>> during the subvolume mounting phase the rollback replaces the "original" 
>> subvolume)
>>
>> 2) handles the automount (i.e. if a subvolume has the right xattr it is 
>> automatically mounted in the right place)
>>
>> My idea is that the subvolume are grouped so:
>>
>> @  simple subvolume
>> @.  snapshot of subvolume 
>> @.rollback rollback subvolume
>>
>> If @.rollback exists, then it replace @ (something went wrong, 
>> the machine rebooted so the rollback have to take place of the original 
>> subvolume)
>>
>> For each @ subvolume the following xattrs are considered:
>> user.btrfs.automount=1|0 the subvolume has to be automounted
>> user.btrfs.mntpoint=   subvolume mount point
>>
>>
>> So this "mount.btrsf" command should:
>>
>> 1) mount the root btrfs filesystem (subvolid=5) in a temporary directory
>> 2) performing the auto rollback (this behaviour can be controlled by another 
>> xattr)
>> 3) mount the subvolume "@" as "root" (like the default one) in the right 
>> mount point
>> 4) for each subvolume which has user.btrfs.automount=1, it should be mounted 
>> under the path stored in the "user.btrfs.mntpoint" xattr (relative to "@" or 
>> absolute)
>> 5) umount the btrfs filesystem mounted at #1, or better move it to a new 
>> position in order
>> to allow managing (snapshot) of the different subvolumes.
>>
>> Thoughts ?
> 
> In effect this obviates boot parameter rootflags=subvol= since the
> file system metadata self-describe the subvol to be mounted,
> correct?
Definetely; 
> 
> Your suggestion also sounds like it places snapshots outside of their
> parent subvolume? 
yes, to me snapshot and "parent subvolume" are sibling more than parent/child. 
This simplify the rollback scenario. For example using the convention above we 
can rollback all the subvolumes mounting all the snapshots where the timestamp 
is less than the desired value.

> If so it mitigates a possible security concern if
> the snapshot contains (old) binaries with vulnerabilities. I asked
> about how to go about assessing this on the Fedora security list: 
> https://lists.fedoraproject.org/pipermail/security/2014-February/001748.html
>
> There aren't many replies but the consensus is that it's a legitimate
> concern, so either the snapshots shouldn't be persistently available
> (which is typical with e.g. snapper, and also
> yum-plugin-fs-snapshot), and/or when the subvolume containing
> snapshots is mounted, it's done with either mount option noexec or
> nosuid (no consensus on which one, although Gnome Shell uses nosuid
> by default when automounting removable media). 

Interesting observation
> 
> 
> Chris Murphy
> 
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADS-UP] Discoverable Partitions Spec

2014-03-12 Thread Goffredo Baroncelli
On 03/12/2014 08:12 PM, Goffredo Baroncelli wrote:
[...]
> I am working to prototype something like that. A "mount.btrfs" command which
> 1) handles the rollback (i.e. the user make a snapshot which is a rollback; 
> if something goes wrong and the machine reboot before ending the process, 
> during the subvolume mounting phase the rollback replaces the "original" 
> subvolume)
> 
> 2) handles the automount (i.e. if a subvolume has the right xattr it is 
> automatically mounted in the right place)
> 
> My idea is that the subvolume are grouped so:
> 
> @   simple subvolume
> @.   snapshot of subvolume 
> @.rollback  rollback subvolume
> 
> If @.rollback exists, then it replace @ (something went wrong, 
> the machine rebooted so the rollback have to take place of the original 
> subvolume)
> 
> For each @ subvolume the following xattrs are considered:
> user.btrfs.automount=1|0  the subvolume has to be automounted
> user.btrfs.mountpoint=  subvolume mount point
> 
> 
> So this "mount.btrfs" command should:
> 
> 1) mount the root btrfs filesystem (subvolid=5) in a temporary directory
> 2) performing the auto rollback (this behaviour can be controlled by another 
> xattr)
> 3) mount the subvolume "@" as "root" (like the default one) in the right 
> mount point
> 4) for each subvolume which has user.btrfs.automount=1, it should be mounted 
> under the path stored in the "user.btrfs.mntpoint" xattr (relative to "@" or 
> absolute)
> 5) umount the btrfs filesystem mounted at #1, or better move it to a new 
> position in order
> to allow managing (snapshot) of the different subvolumes.
> 


Below my POC.

$ sudo mount /dev/loop0 t
$ sudo getfattr -d t/{@*,.}
# file: t/@
user.btrfs.automount="1"

# file: t/@__boot__
user.btrfs.automount="1"
user.btrfs.mountpoint="/boot"

# file: t/@.broken-0
user.btrfs.automount="1"

# file: t/@.broken-1
user.btrfs.automount="1"

# file: t/@__home__
user.btrfs.automount="1"
user.btrfs.mountpoint="/home"

# file: t/@__root__.12345
user.btrfs.automount="0"

# file: t/@space space
user.btrfs.automount="0"

# file: t/@__srv__
user.btrfs.automount="0"

# file: t/.
user.btrfs.automount="1"
user.btrfs.autorollback="1"
user.btrfs.mountpoint="/var/btrfs"

$ sudo umount t


$ sudo sh mount.btrfs /dev/loop0 t
Mount /dev/loop0/@ ->  t
Mount /dev/loop0/@__boot__ ->  t//boot
Mount /dev/loop0/@__home__ ->  t//home
Mount /dev/loop0 -> t//var/btrfs
ghigo@venice:~/btrfs/mount-btrfs$ find t
t
t/var
t/var/btrfs
t/var/btrfs/@__boot__
t/var/btrfs/@__home__
t/var/btrfs/@__srv__
t/var/btrfs/@__root__.12345
t/var/btrfs/@space space
t/var/btrfs/@.broken-0
t/var/btrfs/@.broken-0/var
t/var/btrfs/@.broken-0/var/btrfs
t/var/btrfs/@.broken-0/@__boot__
t/var/btrfs/@.broken-0/@__home__
t/var/btrfs/@.broken-0/boot
t/var/btrfs/@.broken-0/home
t/var/btrfs/@.broken-0/non-rollback-subvol
t/var/btrfs/@.broken-1
t/var/btrfs/@.broken-1/var
t/var/btrfs/@.broken-1/var/btrfs
t/var/btrfs/@.broken-1/@__boot__
t/var/btrfs/@.broken-1/@__home__
t/var/btrfs/@.broken-1/boot
t/var/btrfs/@.broken-1/home
t/var/btrfs/@.broken-1/rollback-subvol
find: File system loop detected; ‘t/var/btrfs/@’ is part of the same file 
system loop as ‘t’.
t/@__boot__
t/@__home__
t/boot
t/home
t/non-rollback-subvol





$cat mount.btrfs
#!/bin/sh

XAAMOUNT="user.btrfs.automount"
XAAMNTPNT="user.btrfs.mountpoint"
XAAROLLBACK="user.btrfs.autorollback"

xdebug() {
[ -z "$DEBUG" ] && return
echo "$@"
}

autorollback() {

( cd "$tmpdir"; ls -d @*.rollback 2>/dev/null) | while read drb; do

rbfullpath="$tmpdir/$drb"
[ ! -d "$rbfullpath" ] && continue
is_btrfs_subvol "$rbfullpath" || continue

f="$(get_fattr $XAAROLLBACK "$rbfullpath" )"
[ "x$f" = "x0" ] && continue

d="$(echo $drb | sed -e "s/.rollback$//")"
fullpath="$tmpdir/$d"

if [ -d "$fullpath" ]; then
f="$(get_fattr $XAAROLLBACK "$fullpath" )"
[ "x$f" = "x0" ] && continue

i=0
while true; do
newfullpath="$fullpath.broken-$i"
if [ ! -e  "$newfullpath" ]; then
mv "$fullpath" "$newfullpath"
break
fi
i=$(($i+1))
done
fi

echo "Rollback $d"
mv "

Re: [systemd-devel] timed out waiting for device dev-disk-by\x2duuid

2014-05-15 Thread Goffredo Baroncelli
On 05/15/2014 08:16 PM, Lennart Poettering wrote:
> On Thu, 15.05.14 19:29, Lennart Poettering (lenn...@poettering.net) wrote:
> 
>> On Mon, 12.05.14 20:48, Chris Murphy (li...@colorremedies.com) wrote:
>>
[...]
> 
> So, as it turns out there's no kernel APi available to check whether a
> btrfs raid array is now complete enough for degraded mode to
> succeed. There's only a way to check whether it is fully complete.
> 
> And even if we had an API for this, how would this even work at all? 

In what this should be different than the normal RAID system ? 

In both case there are two timeout: the first one is for waiting the full 
system, the second one is for the minimal set of disks to a degraded mode. If 
even the second timeout is passed, then we should consider the filesystem not 
build-able.

How it is handle for the RAID system ? Knowing that we should consider to apply 
the same strategies fro btrfs (may be we need some userspace tool to do that)






> mean, just having a catchall switch to boot in degraded mode is really
> dangerous if people have more than one array and we might end up
> mounting an fs in degraded mode that actually is fully available if we
> just waited 50ms longer...
> 
> I mean this is even the problem with just one array: if you have
> redundancy of 3 disks, when do you start mounting the thing when
> degraded mode is requested on the kernel command line? as soon as
> degrdaded mounting is possible (thus fucking up possible all 3 disks
> that happened to show up last), or later?
> 
> I have no idea how this all should work really, it's a giant mess. There
> probably needs to be some btrfs userspace daemon thing that watches
> btrfs arrays and does some magic if they timeout.
> 
> But for now I am pretty sure we should just leave everything in fully
> manual mode, that's the safest thing to do...
> 
> Lennart
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Slow startup of systemd-journal on BTRFS

2014-06-11 Thread Goffredo Baroncelli
Hi all,

I would like to share a my experience about a slowness of systemd when used on 
BTRFS.

My boot time was very high (about ~50 seconds); most of time it was due to 
NetworkManager which took about 30-40 seconds to start (this data came from 
"systemd-analyze plot").

I make several attempts to address this issue. Also I noticed that sometime 
this problem disappeared; but I was never able to understand why.

However this link

https://bugzilla.redhat.com/show_bug.cgi?id=1006386

suggested me that the problem could be due to a bad interaction between systemd 
and btrfs. NetworkManager was innocent. 
It seems that systemd-journal create a very hight fragmented files when it 
stores its log. And BTRFS it is know to behave slowly when a file is highly 
fragmented.
This had caused a slow startup of systemd-journal, which in turn had blocked 
the services which depend by the loggin system.

In fact after I de-fragmented the files under /var/log/journal [*], my boot 
time decreased of about 20second (from 50s to 30s).

Unfortunately I don't have any data to show. The next time I will try to 
collect more information. But I am quite sure that when the log are highly 
fragmented systemd-journal becomes very slow on BTRFS.

I don't know if the problem is more on the systemd side or btrfs side. What I 
know is that both the projects likely will be important in the near futures, 
and both must work well together.

I know that I can "chattr +C" to avoid COW for some files; but I don't want to 
lost also the checksum protection. 

If someone is able to suggest me how FRAGMENT the log file, I can try to 
collect more scientific data.


BR
G.Baroncelli

[*] 
# btrfs fi defrag /var/log/journal/*/*



-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] R: Re: Slow startup of systemd-journal on BTRFS

2014-06-12 Thread Goffredo Baroncelli


>Messaggio originale
>Da: li...@colorremedies.com
>Data: 12/06/2014 2.40
>A: , "Goffredo Baroncelli"
>Cc: "systemd Mailing List", "linux-btrfs"

>Ogg: Re: Slow startup of systemd-journal on BTRFS
>
>
>On Jun 11, 2014, at 3:28 PM, Goffredo Baroncelli  wrote:
>> 
>> If someone is able to suggest me how FRAGMENT the log file, I can try to 
collect more scientific data.
>
>So long as you're not using compression, filefrag will show you fragments of 
systemd-journald journals. I can vouch for the behavior 
> you experience without xattr +C or autodefrag, but further it also causes 
much slowness when reading journal contents. LIke if I want to 
> search all boots for a particular error message to see how far back it 
started, this takes quite a bit longer than on other file systems. 
> So far I'm not experiencing this problem with autodefrag or any other 
negative side effects, but my understanding is this code is still in flux.
>
>Since the journals have their own checksumming I'm not overly concerned about 
setting xattr +C.

This is true; but it can be a general solution: the checksum of the data are 
needed during a 
scrub and/or a RAID rebuilding.

I want to investigate doing an explicit defrag once a week.


>
>Chris Murphy
G.Baroncelli

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] R: Re: Slow startup of systemd-journal on BTRFS

2014-06-12 Thread Goffredo Baroncelli


>Messaggio originale
>Da: russ...@coker.com.au
>Data: 12/06/2014 3.18
>A: 
>Cc: "systemd Mailing List", "linux-btrfs"

>Ogg: Re: Slow startup of systemd-journal on BTRFS
>
>On Wed, 11 Jun 2014 23:28:54 Goffredo Baroncelli wrote:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1006386
>> 
>> suggested me that the problem could be due to a bad interaction between
>> systemd and btrfs. NetworkManager was innocent.  It seems that
>> systemd-journal create a very hight fragmented files when it stores its
>> log. And BTRFS it is know to behave slowly when a file is highly
>> fragmented. This had caused a slow startup of systemd-journal, which in
>> turn had blocked the services which depend by the loggin system.
>
>On my BTRFS/systemd systems I edit /etc/systemd/journald.conf and put 
>"SystemMaxUse=50M".  That doesn't solve the fragmentation problem but 
reduces 
>it enough that it doesn't bother me.

IIRC my log files are about 80/100MB. So I am not sure if this could help.
I want to investigate also the option

MaxFileSec=1d

which rotates the log file once a day (or a week)

>
>-- 
>My Main Blog http://etbe.coker.com.au/
>My Documents Bloghttp://doc.coker.com.au/
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] R: Re: Slow startup of systemd-journal on BTRFS

2014-06-12 Thread Goffredo Baroncelli
>Da: da...@fromorbit.com
>Data: 12/06/2014 3.21
>A: 
>Cc: "systemd Mailing List", "linux-btrfs"

>Ogg: Re: Slow startup of systemd-journal on BTRFS
>
>On Wed, Jun 11, 2014 at 11:28:54PM +0200, Goffredo Baroncelli wrote:
>> Hi all,
>> 
>> I would like to share a my experience about a slowness of systemd when used 
on BTRFS.
>> 
>> My boot time was very high (about ~50 seconds); most of time it was due to 
NetworkManager which took about 30-40 seconds to start (this data came from 
"systemd-analyze plot").
>> 
>> I make several attempts to address this issue. Also I noticed that sometime 
this problem disappeared; but I was never able to understand why.
>> 
>> However this link
>> 
>>  https://bugzilla.redhat.com/show_bug.cgi?id=1006386
>> 
>> suggested me that the problem could be due to a bad interaction between 
systemd and btrfs. NetworkManager was innocent. 
>
>systemd has a very stupid journal write pattern. It checks if there
>is space in the file for the write, and if not it fallocates the
>small amount of space it needs (it does *4 byte* fallocate calls!)
>and then does the write to it.  All this does is fragment the crap
>out of the log files because the filesystems cannot optimise the
>allocation patterns.

I checked the code, and to me it seems that the fallocate() are
done in FILE_SIZE_INCREASE unit (actually 8MB). 

>
>Yup, it fragments journal files on XFS, too.
>
>http://oss.sgi.com/archives/xfs/2014-03/msg00322.html
>
>IIRC, the systemd developers consider this a filesystem problem and
>so refused to change the systemd code to be nice to the filesystem
>allocators, even though they don't actually need to use fallocate...

If I am able to start a decent setup I would like to play to change some
parameters like:
- remove fallocate at all (at the beginning only ?)
- increase the fallocate allocation unit
- change the file log size and rotation time
- periodically defragment
[...[
>
>Cheers,
>
>Dave.
>
>-- 
>Dave Chinner
>da...@fromorbit.com
>--
>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] R: Re: Slow startup of systemd-journal on BTRFS

2014-06-13 Thread Goffredo Baroncelli
Hi Dave

On 06/13/2014 01:24 AM, Dave Chinner wrote:
> On Thu, Jun 12, 2014 at 12:37:13PM +, Duncan wrote:
>> Goffredo Baroncelli  posted on Thu, 12 Jun 2014
>> 13:13:26 +0200 as excerpted:
>>
>>>> systemd has a very stupid journal write pattern. It checks if there is
>>>> space in the file for the write, and if not it fallocates the small
>>>> amount of space it needs (it does *4 byte* fallocate calls!) and then
>>>> does the write to it.  All this does is fragment the crap out of the log
>>>> files because the filesystems cannot optimise the allocation patterns.
>>>
>>> I checked the code, and to me it seems that the fallocate() are done in
>>> FILE_SIZE_INCREASE unit (actually 8MB).
>>
>> FWIW, either 4 byte or 8 MiB fallocate calls would be bad, I think 
>> actually pretty much equally bad without NOCOW set on the file.
> 
> So maybe it's been fixed in systemd since the last time I looked.
> Yup:
> 
> http://cgit.freedesktop.org/systemd/systemd/commit/src/journal/journal-file.c?id=eda4b58b50509dc8ad0428a46e20f6c5cf516d58
> 
> The reason it was changed? To "save a syscall per append", not to
> prevent fragmentation of the file, which was the problem everyone
> was complaining about...

thanks for pointing that. However I am performing my tests on a fedora 20 with 
systemd-208, which seems have this change
> 
>> Why?  Because btrfs data blocks are 4 KiB.  With COW, the effect for 
>> either 4 byte or 8 MiB file allocations is going to end up being the 
>> same, forcing (repeated until full) rewrite of each 4 KiB block into its 
>> own extent.


I am reaching the conclusion that fallocate is not the problem. The fallocate 
increase the filesize of about 8MB, which is enough for some logging. So it is 
not called very often.

I have to investigate more what happens when the log are copied from /run to 
/var/log/journal: this is when journald seems to slow all.

I am prepared a PC which reboot continuously; I am collecting the time required 
to finish the boot vs the fragmentation of the system.journal file vs the 
number of boot. The results are dramatic: after 20 reboot, the boot time 
increase of 20-30 seconds. Doing a defrag of system.journal reduces the boot 
time to the original one, but after another 20 reboot, the boot time still 
requires 20-30 seconds more

It is a slow PC, but I saw the same behavior also on a more modern pc (i5 with 
8GB).

For both PC the HD is a mechanical one...

> 
> And that's now a btrfs problem :/

Are you sure ?

ghigo@venice:/var/log$ sudo filefrag messages
messages: 29 extents found

ghigo@venice:/var/log$ sudo filefrag journal/*/system.journal
journal/41d686199835445395ac629d576dfcb9/system.journal: 1378 extents found

So the old rsyslog creates files with fewer fragments. BTRFS (but it seems also 
xfs) for sure highlights more this problem than other filesystem. But also 
systemd seems to create a lot of extens.

BR
G.Baroncelli



> 
> Cheers,
> 
> Dave.
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Slow startup of systemd-journal on BTRFS

2014-06-14 Thread Goffredo Baroncelli
On 06/14/2014 04:53 AM, Duncan wrote:
> Goffredo Baroncelli posted on Sat, 14 Jun 2014 00:19:31 +0200 as
> excerpted:
> 
>> On 06/13/2014 01:24 AM, Dave Chinner wrote:
>>> On Thu, Jun 12, 2014 at 12:37:13PM +, Duncan wrote:
>>>>
>>>> FWIW, either 4 byte or 8 MiB fallocate calls would be bad, I think
>>>> actually pretty much equally bad without NOCOW set on the file.
>>>
>>> So maybe it's been fixed in systemd since the last time I looked.
>>> Yup:
>>>
>>> http://cgit.freedesktop.org/systemd/systemd/commit/src/journal/journal-
> file.c?id=eda4b58b50509dc8ad0428a46e20f6c5cf516d58
>>>
>>> The reason it was changed? To "save a syscall per append", not to
>>> prevent fragmentation of the file, which was the problem everyone was
>>> complaining about...
>>
>> thanks for pointing that. However I am performing my tests on a fedora
>> 20 with systemd-208, which seems have this change
>>>
>>>> Why?  Because btrfs data blocks are 4 KiB.  With COW, the effect for
>>>> either 4 byte or 8 MiB file allocations is going to end up being the
>>>> same, forcing (repeated until full) rewrite of each 4 KiB block into
>>>> its own extent.
>>
>> I am reaching the conclusion that fallocate is not the problem. The
>> fallocate increase the filesize of about 8MB, which is enough for some
>> logging. So it is not called very often.
> 
> But... 
> 
> If a file isn't (properly[1]) set NOCOW (and the btrfs isn't mounted with 
> nodatacow), then an fallocate of 8 MiB will increase the file size by 8 
> MiB and write that out.  So far so good as at that point the 8 MiB should 
> be a single extent.  But then, data gets written into 4 KiB blocks of 
> that 8 MiB one at a time, and because btrfs is COW, the new data in the 
> block must be written to a new location.
> 
> Which effectively means that by the time the 8 MiB is filled, each 4 KiB 
> block has been rewritten to a new location and is now an extent unto 
> itself.  So now that 8 MiB is composed of 2048 new extents, each one a 
> single 4 KiB block in size.

Several people pointed fallocate as the problem. But I don't understand the 
reason.
1) 8MB is a quite huge value, so fallocate is called (at worst) 1 time during 
the boot. Often never because the log are less than 8MB.
2) it is true that btrfs "rewrite" almost 2 times each 4kb page with fallocate. 
But the first time is a "big" write of 8MB; instead the second write would 
happen in any case. What I mean is that without the fallocate in any case 
journald would make small write.

To be honest, I fatigue to see the gain of having a fallocate on a COW 
filesystem... may be that I don't understand very well the fallocate() call.

> 
> =:^(
> 
> Tho as I already stated, for file sizes upto 128 MiB or so anyway[2], the 
> btrfs autodefrag mount option should at least catch that and rewrite 
> (again), this time sequentially.
> 
>> I have to investigate more what happens when the log are copied from
>> /run to /var/log/journal: this is when journald seems to slow all.
> 
> That's an interesting point.
> 
> At least in theory, during normal operation journald will write to 
> /var/log/journal, but there's a point during boot at which it flushes the 
> information accumulated during boot from the volatile /run location to 
> the non-volatile /var/log location.  /That/ write, at least, should be 
> sequential, since there will be > 4 KiB of journal accumulated that needs 
> to be transferred at once.  However, if it's being handled by the forced 
> pre-write fallocate described above, then that's not going to be the 
> case, as it'll then be a rewrite of already fallocated file blocks and 
> thus will get COWed exactly as I described above.
> 
> =:^(
> 
> 
>> I am prepared a PC which reboot continuously; I am collecting the time
>> required to finish the boot vs the fragmentation of the system.journal
>> file vs the number of boot. The results are dramatic: after 20 reboot,
>> the boot time increase of 20-30 seconds. Doing a defrag of
>> system.journal reduces the boot time to the original one, but after
>> another 20 reboot, the boot time still requires 20-30 seconds more
>>
>> It is a slow PC, but I saw the same behavior also on a more modern pc
>> (i5 with 8GB).
>>
>> For both PC the HD is a mechanical one...
> 
> The problem's duplicable.  That's the first step toward a fix. =:^)
I Hope so
> 
>>> And that's now a btrfs problem :/
>>
>> Are you sure ?
> 
> As th

Re: [systemd-devel] Slow startup of systemd-journal on BTRFS

2014-06-14 Thread Goffredo Baroncelli
On 06/14/2014 12:59 PM, Kai Krakow wrote:
[...]
> 
> I think that systemd is even one of the early supporters of btrfs because it 
> will defragment readahead files on boot from btrfs. 

In know that systemd does readahead, but it is the first time that I heard that 
it does defrag too. Could you elaborate ?

> I'd suggest the problem 
> is to be found in the different semantics with COW filesystems. 
[--cut--cut--cut--]

The problem is due to the interaction between BTRFS and systemd. Both behave 
differently regarding their predecessor. 

[--cut--cut--cut--]

> So let's start with my journals, on btrfs:
> 
> $ sudo filefrag *
> system@0004fad12dae7676-98627a3d7df4e35e.journal~: 2 extents found
> system@0004fae8ea4b84a4-3a2dc4a93c5f7dc9.journal~: 2 extents found
> system@806cd49faa074a49b6cde5ff6fca8adc-0008e4cc-0004f82580cdcb45.journal:
>  
> 5 extents found
> system@806cd49faa074a49b6cde5ff6fca8adc-00097959-0004f89c2e8aff87.journal:
>  
> 5 extents found   
>
> system@806cd49faa074a49b6cde5ff6fca8adc-000a166d-0004f98d7e04157c.journal:
>  
> 5 extents found   
>
> system@806cd49faa074a49b6cde5ff6fca8adc-000aad59-0004fa379b9a1fdf.journal:
>  
> 5 extents found
> system@ec16f60db38f43619f8337153a1cc024-0001-0004fae8e5057259.journal:
>  
> 5 extents found
> system@ec16f60db38f43619f8337153a1cc024-92b1-0004fb59b1d034ad.journal:
>  
> 5 extents found
> system.journal: 9 extents found
> user-500@e4209c6628ed4a65954678b8011ad73f-00085b7a-0004f77d25ebba04.journal:
>  
> 2 extents found
> user-500@e4209c6628ed4a65954678b8011ad73f-0008e7fb-0004f83c7bf18294.journal:
>  
> 2 extents found
> user-500@e4209c6628ed4a65954678b8011ad73f-00097fe4-0004f8ae69c198ca.journal:
>  
> 2 extents found
> user-500@e4209c6628ed4a65954678b8011ad73f-000a1a7e-0004f9966e9c69d8.journal:
>  
> 2 extents found
> user-500.journal: 2 extents found
> 
> I don't think these are too bad values, eh?
> 
> Well, how did I accomblish that?
> 
> First, I've set the journal directories nocow. 

If you use nocow, you lost the btrfs ability to rebuild a RAID array discarding 
the wrong sector. With the systemd journal checksum, you can say that a data is 
wrong, but BTRFS with its checksum (when used in raid mode) is able to 
*correct* the data too.

[...]


> 
> Back to the extents counts: What I did next was implementing a defrag job 
> that regularly defrags the journal (actually, the complete log directory as 
> other log files suffer the same problem):
> 
> $ cat /usr/local/sbin/defrag-logs.rb 
> #!/bin/sh
> exec btrfs filesystem defragment -czlib -r /var/log

I think that this is a more viable solution; but what seems to me strange is 
the fact that the fragmentation level of my rsyslog file is a lot lower.

[...]


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Slow startup of systemd-journal on BTRFS

2014-06-14 Thread Goffredo Baroncelli
On 06/14/2014 04:03 PM, Kai Krakow wrote:
> Goffredo Baroncelli  schrieb:
> 
>> On 06/14/2014 12:59 PM, Kai Krakow wrote:
>> [...]
>>>
>>> I think that systemd is even one of the early supporters of btrfs because
>>> it will defragment readahead files on boot from btrfs.
>>
>> In know that systemd does readahead, but it is the first time that I heard
>> that it does defrag too. Could you elaborate ?
> 
> Look at src/readahead/readahead-collect.c. It works for btrfs and ext4. With 
> ext4 it will also relocate the files. Not sure if it does for btrfs. 

I am looking at the source, and yes, it does. To be honest it seems to 
defragment only on btrfs.

> If it 
> does, the question is: where to relocate in a multi devices file system?

Systemd uses the defrag capability of btrfs.

 
> That means, you have to enable systemd-readahead-collect, tho.
I have to admit that I disabled it. I will make some test also with readhead 
enabled. 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Slow startup of systemd-journal on BTRFS

2014-06-14 Thread Goffredo Baroncelli
On 06/14/2014 10:13 PM, Kai Krakow wrote:
> Goffredo Baroncelli  schrieb:
> 
>> I am looking at the source, and yes, it does. To be honest it seems to
>> defragment only on btrfs.
> 
> Oh that seems true... Then defragmenting and relocating is probably a todo 
> item. I'm sure I've read about it somewhere.
> 
>>> If it
>>> does, the question is: where to relocate in a multi devices file system?
>>
>> Systemd uses the defrag capability of btrfs.
> 
> man:systemd-readahead says it relocates and defrags if supported. Scanning 
> through the source, it only defrags.
> 
>>> That means, you have to enable systemd-readahead-collect, tho.
>> I have to admit that I disabled it. I will make some test also with
>> readhead enabled.
> 
> Take care to enable all needed services to enable defrag... If your services 
> make use of journal file loading these files should also become part of the 
> process. You can check with "/usr/lib/systemd/systemd-readahead analyze". 
> The state is stored in /.readahead.
> 
I have enabled all the services (collect, replay, done), but I was unable to 
see any gain. 

I don't know why but system.journal is not considered by readahead:

# /usr/lib/systemd/systemd-readahead analyze | grep journal


   100% ( 1)  770: /etc/systemd/journald.conf
50% ( 1)  4194304: 
/var/log/journal/36f10f5379ec4a1398ac303a0ce20fd0/user-997.journal
50% ( 1)  4194304: 
/var/log/journal/36f10f5379ec4a1398ac303a0ce20fd0/user-1000.journal


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Slow startup of systemd-journal on BTRFS

2014-06-15 Thread Goffredo Baroncelli
On 06/14/2014 11:37 PM, Kai Krakow wrote:
> Goffredo Baroncelli  schrieb:
> 
>> On 06/14/2014 10:13 PM, Kai Krakow wrote:
>>> Take care to enable all needed services to enable defrag... If your
>>> services make use of journal file loading these files should also become
>>> part of the process. You can check with
>>> "/usr/lib/systemd/systemd-readahead analyze". The state is stored in
>>> /.readahead.
>>>
>> I have enabled all the services (collect, replay, done), but I was unable
>> to see any gain.
> 
> I don't think you can actually enable the "done" service because it is 
> static... But if it placed a symlink in /etc/systemd that is probably wrong 
> and you should undo it. It is automatically activated a few seconds after 
> booting is done to store the collected data for replaying on next reboot.

I checked the output of systemd-analyze: systemd-readahead-done is called at 
the end of the chart. The sym-links exist only for the collect and replay 
services, which were called at the beginning.

> 
> Check with "find /etc/systemd -name '*readahead*' -type l"...
> 
>> I don't know why but system.journal is not considered by readahead:
>>
>> # /usr/lib/systemd/systemd-readahead analyze | grep journal
>>
>>
>>100% ( 1)  770: /etc/systemd/journald.conf
>> 50% ( 1)  4194304:
>> /var/log/journal/36f10f5379ec4a1398ac303a0ce20fd0/user-997.journal
>> 50% ( 1)  4194304:
>> /var/log/journal/36f10f5379ec4a1398ac303a0ce20fd0/user-1000.journal
> 
> I tried that on my system and it looks vice-versa somehow:
> 
> $ sudo /usr/lib/systemd/systemd-readahead analyze|fgrep journal
>100% ( 1)  812: /etc/systemd/journald.conf
>100% ( 1)  8388608: 
> /var/log/journal/121b87ca633e8ac001665668001b/system.journal
> 
> Strange... ;-)

May be this is a reason of my degraded performance. However I tried on two 
machines (all fedora) and after several reboot; I tried also removing the 
/.readahead file...
Nothing.


On another fedora machine, I found

$ sudo /lib/systemd/systemd-readahead analyze | grep journal
   100% ( 1)   220384: /usr/lib/systemd/systemd-journald
   100% ( 1)  770: /etc/systemd/journald.conf
   100% ( 1)24544: /usr/lib64/rsyslog/imjournal.so
   100% ( 1)   122880: /usr/lib64/libsystemd-journal.so.0.11.3
50% ( 1)  4194304: 
/var/log/journal/189323cd4cc348159b9fd5b32b566b05/user-1000@0004f1b1019bc549-09d9ae048ff8eeef.journal~
   100% ( 1)  8388608: 
/var/log/journal/189323cd4cc348159b9fd5b32b566b05/user-1000@0004f1b7dcfd728c-38950862a134194d.journal~
50% ( 1)  4194304: 
/var/log/journal/189323cd4cc348159b9fd5b32b566b05/user-995@0004f1c3fa738010-b0829bd837e30f61.journal~
50% ( 1)  4194304: 
/var/log/journal/189323cd4cc348159b9fd5b32b566b05/user-995.journal
   100% ( 1)  8388608: 
/var/log/journal/189323cd4cc348159b9fd5b32b566b05/system@0004f6ed6604ebc7-bb2f3b30ac6eb32c.journal~
   100% ( 1)  8388608: 
/var/log/journal/189323cd4cc348159b9fd5b32b566b05/user-1000@0004f6ed66a05b08-9d244b02cc294baf.journal~
77% ( 1)  6508544: 
/var/log/journal/189323cd4cc348159b9fd5b32b566b05/system@0004f6ed6d02e69f-d9c815aa06264d7a.journal~
   100% ( 1)  8388608: 
/var/log/journal/189323cd4cc348159b9fd5b32b566b05/user-1000@0004f6ed6db6dd7d-cc84da57d5b01680.journal~
   100% ( 1)  8388608: 
/var/log/journal/189323cd4cc348159b9fd5b32b566b05/user-1000.journal
   100% ( 1)  121: /var/lib/rsyslog/imjournal.state


So some journal files are present, but not the current one (system.journal)



> To get back to your performance problem: Did you try "systemctl enable 
> bootchart" and then (after a reboot) look at the png generated in /run/log 
> (at least in my system it is placed there, look at the configs if in doubt). 
> It can give some interesting clues about why a service starts late or takes 
> long to execute.
Unfortunately bootchart doesn't work on fedora, because CONFIG_SCHEDSTATS is 
not enabled in the kernel.

 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Slow startup of systemd-journal on BTRFS

2014-06-15 Thread Goffredo Baroncelli
On 06/15/2014 12:50 PM, Kai Krakow wrote:
> Goffredo Baroncelli  schrieb:
> 
>>>> I have enabled all the services (collect, replay, done), but I was
>>>> unable to see any gain.
>>>
>>> I don't think you can actually enable the "done" service because it is
>>> static... But if it placed a symlink in /etc/systemd that is probably
>>> wrong and you should undo it. It is automatically activated a few seconds
>>> after booting is done to store the collected data for replaying on next
>>> reboot.
>>
>> I checked the output of systemd-analyze: systemd-readahead-done is called
>> at the end of the chart. The sym-links exist only for the collect and
>> replay services, which were called at the beginning.
> 
> Then all is good.
> 
>>>> I don't know why but system.journal is not considered by readahead:
>>>>
>>>> # /usr/lib/systemd/systemd-readahead analyze | grep journal
>>>>
>>>>
>>>>100% ( 1)  770: /etc/systemd/journald.conf
>>>> 50% ( 1)  4194304:
>>>> /var/log/journal/36f10f5379ec4a1398ac303a0ce20fd0/user-997.journal
>>>> 50% ( 1)  4194304:
>>>> /var/log/journal/36f10f5379ec4a1398ac303a0ce20fd0/user-1000.journal
>>>
>>> I tried that on my system and it looks vice-versa somehow:
>>>
>>> $ sudo /usr/lib/systemd/systemd-readahead analyze|fgrep journal
>>>100% ( 1)  812: /etc/systemd/journald.conf
>>>100% ( 1)  8388608:
>>> /var/log/journal/121b87ca633e8ac001665668001b/system.journal
>>>
>>> Strange... ;-)
>>
>> May be this is a reason of my degraded performance. However I tried on two
>> machines (all fedora) and after several reboot; I tried also removing the
>> /.readahead file... Nothing.
> 
> I'm not sure but generally it is said that readahead utilities may need a 
> few reboots before fully working. Did you try that? Will it take the journal 
> into account then?

I hope that today I will publish my results. Anyway I performed about 50-70 
reboot on each configuration.

> 
> Maybe there is something special about your setup and your system.journal 
> does not become touched during boot because of this? You said you also use 
> rsyslogd. Maybe logs are redirected there instead of written to both logging 
> mechanisms? Maybe most logging by system daemons is turned off or they write 
> directly to log files?
> 
> I tuned all my daemons and services to not write to any log files at all, 
> instead log everything to stderr/stdout (which is captured by systemd) or 
> into the syslogger which is journald as the only mechanism in my system. I 
> just prefer to have everything in one place, and journalctl is a great way 
> to filter the logs or create custom views. Maybe that's a reason why it is 
> taken into account by readahead on my system? I can only guess but maybe 
> worth a try for you...

I don't have any idea... definitely I will investigate a bit. 

> 
>>> To get back to your performance problem: Did you try "systemctl enable
>>> bootchart" and then (after a reboot) look at the png generated in
>>> /run/log (at least in my system it is placed there, look at the configs
>>> if in doubt). It can give some interesting clues about why a service
>>> starts late or takes long to execute.
>> Unfortunately bootchart doesn't work on fedora, because CONFIG_SCHEDSTATS
>> is not enabled in the kernel.
> 
> Well, I'm on Gentoo so my kernel is custom built by design. But I checked my 
> config, I even have no config option "CONFIG_SCHEDSTATS" but bootchart works 
> for me:

I am talking about systemd-bootchart. I have to look if Fedora has the classic 
bootchart.



> 
> $ zfgrep SCHED /proc/config.gz 
> CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
> CONFIG_CGROUP_SCHED=y
> CONFIG_FAIR_GROUP_SCHED=y
> CONFIG_RT_GROUP_SCHED=y
> CONFIG_SCHED_AUTOGROUP=y
> CONFIG_IOSCHED_NOOP=y
> CONFIG_IOSCHED_DEADLINE=y
> CONFIG_IOSCHED_CFQ=y
> CONFIG_CFQ_GROUP_IOSCHED=y
> CONFIG_IOSCHED_BFQ=y
> CONFIG_DEFAULT_IOSCHED="deadline"
> CONFIG_SCHED_OMIT_FRAME_POINTER=y
> CONFIG_SCHED_SMT=y
> CONFIG_SCHED_MC=y
> CONFIG_SCHED_HRTICK=y
> # CONFIG_NET_SCHED is not set
> CONFIG_USB_EHCI_TT_NEWSCHED=y
> 
> BTW: After migrating to systemd, I had a boot time of 15s to DM on 
> rotational with serveral heavy services enabled. I did not measure time to 
> desktop back then. But after I did a btrfs balance (I use multidevice 
> btrfs), the time to DM went up to around 25s and never went back no matt

Re: [systemd-devel] Slow startup of systemd-journal on BTRFS

2014-06-16 Thread Goffredo Baroncelli
Hi Lennart,

On 06/16/2014 12:13 AM, Lennart Poettering wrote:
> I am not really following though why this trips up btrfs though. I am
> not sure I understand why this breaks btrfs COW behaviour. I mean,
> fallocate() isn't necessarily supposed to write anything really, it's
> mostly about allocating disk space in advance. I would claim that
> journald's usage of it is very much within the entire reason why it
> exists...

I performed several tests, trying different setups [1]. One of these was 
replacing the posix_fallocate() with a truncate, to check where is the problem. 
The conclusion was that *posix_fallocate() is NOT the problem*.

In another reply you stated that systemd-journald appends some data at the end 
of file, then update some data in the middle. I think this is the reason 
because the file becomes quickly fragmented.


[1] Let me to revise the english, the I will post the results.


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Slow startup of systemd-journal on BTRFS

2014-06-16 Thread Goffredo Baroncelli
Hi all,

in this blog [1] I collected all the results of the tests which I performed in 
order to investigate a bit this performance problem between systemd and btrfs. 
I had to put these results in a blog, because there are several images. Below a 
brief summary.

I took an old machine (a P4 2.5GHz with 512MB of ram) where was present a fresh 
installation of a Fedora 20 and I measured the boot time during several reboots 
(up to 70).  I tested the following scenarios

1) standard (without defragmenting any file, without readahead)
2) defragment the journal file at the end of the boot
3) defragment the journal file before the flushing
4) mark as NOCOW the journald log file
5) enable the systemd-readahead
6) remove the fsync(2) call from journald
7) remove the posix_fallocate(3) call from journald
8) do a defrag when posix_fallocate(3) is called
9) do a defrag when the journal log file is opened

The test #1 highlight the problem. It shows that the boot time may require up 
to 50 seconds. During the reboots the number of extents of the file 
system.journal increases up to 6000. De-fragmenting the system.journal file the 
boot time decreases by ~20 seconds. My conclusion is that in BTRFS the 
fragmentation of this file increases the boot time.

The test #6 and #7 suggested that the fsync(2) amd posix_fallocate(3) calls 
aren't the root cause of the problem. Even without these the system.journal 
file still fragments.

The test #4 suggested that marking NOCOW the system.journal file reduces its 
fragmentation and so the boot time.

The test #2,#3,#9 suggested that performing a periodic defrag reduces 
significantly the fragmentation of system.journal and so the boot time.

The test #5 revealed that the readahead capability of systemd was not efficacy 
because it seems that the system.journal file was unaffected (but other 
*.journal files were). Further investigation is required.

BR
G.Baroncelli



[1] http://kreijack.blogspot.it/2014/06/btrfs-and-systemd-journal.html

On 06/16/2014 06:32 PM, Goffredo Baroncelli wrote:
> Hi Lennart,
> 
> On 06/16/2014 12:13 AM, Lennart Poettering wrote:
>> I am not really following though why this trips up btrfs though. I am
>> not sure I understand why this breaks btrfs COW behaviour. I mean,
>> fallocate() isn't necessarily supposed to write anything really, it's
>> mostly about allocating disk space in advance. I would claim that
>> journald's usage of it is very much within the entire reason why it
>> exists...
> 
> I performed several tests, trying different setups [1]. One of these was 
> replacing the posix_fallocate() with a truncate, to check where is the 
> problem. The conclusion was that *posix_fallocate() is NOT the problem*.
> 
> In another reply you stated that systemd-journald appends some data at the 
> end of file, then update some data in the middle. I think this is the reason 
> because the file becomes quickly fragmented.
> 
> 
> [1] Let me to revise the english, the I will post the results.
> 
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Slow startup of systemd-journal on BTRFS

2014-06-17 Thread Goffredo Baroncelli
Hi Kai,


I investigate a bit why readahead doesn't defrag the journal.
I put in CC also the mailing list for "the record".

On 06/17/2014 03:33 AM, Kai Krakow wrote:
[...]
> Instead, for me, the readahead collector catches access to my system
> journal and thus defragments it. That's not the case for your system
> which explains why it won't be defragmented for you when enabling
> readahead.

The readahead program skips file greater than 10M. If you look at its man page, 
it documents the switch --file-size-max. The default value is 
READAHEAD_FILE_SIZE_MAX which is equal to 10M.
So only when the system.journal is smaller than this value, the readahead 
defrags it.
But this happens only few times. 

This behavior seems to me reasonable to avoid blocking a system when a big file 
is touched during the boot.

BR
G.Baroncelli

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 10/11] Avoid doubling the efforts for /var/log/journal

2014-06-18 Thread Goffredo Baroncelli
Hi Fink

On 06/13/2014 04:41 PM, Werner Fink wrote:
> That is: set NOATIME, NOCOW, and NOCOMP for the journal directory
> 
> ---
>  src/journal/journald-server.c |   29 +++--
>  1 file changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git src/journal/journald-server.c src/journal/journald-server.c
> index eda5dcf..37d6dc3 100644
> --- src/journal/journald-server.c
> +++ src/journal/journald-server.c
> @@ -21,6 +21,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -920,7 +921,7 @@ finish:
>  
>  
>  static int system_journal_open(Server *s) {
> -int r;
> +int r, fd;
>  char *fn;
>  sd_id128_t machine;
>  char ids[33];
> @@ -947,7 +948,31 @@ static int system_journal_open(Server *s) {
>  (void) mkdir("/var/log/journal/", 0755);
>  
>  fn = strappenda("/var/log/journal/", ids);
> -(void) mkdir(fn, 0755);
> +(void)mkdir(fn, 0755);
> +
> +/*
> + * On journaling and/or compressing file systems avoid 
> doubling the
> + * efforts for the system, that is set NOCOW and NOCOMP 
> inode flags.
> + * Check for every single flag as otherwise some of the file 
> systems
> + * may return EOPNOTSUPP on one unkown flag (like BtrFS 
> does).
> + */
> +if ((fd = open(fn, O_DIRECTORY)) >= 0) {
> +long flags;
> +if (ioctl(fd, FS_IOC_GETFLAGS, &flags) == 0) {
> +int old = flags;
> +if (!(flags&FS_NOATIME_FL) && ioctl(fd, 
> FS_IOC_SETFLAGS, flags|FS_NOATIME_FL) == 0)
> +flags |= FS_NOATIME_FL;
> +if (!(flags&FS_NOCOW_FL) && ioctl(fd, 
> FS_IOC_SETFLAGS, flags|FS_NOCOW_FL) == 0)
> +flags |= FS_NOCOW_FL;

If I read correctly, you want set UN-conditionally the NOCOW behavior. Please, 
please, please DON'T DO that.

The NOCOW behavior is not without disadvantage: yes it increase the performance 
but
the file also lost the btrfs checksum protection; when BTRFS manage the disks 
in RAID mode and a corruption happens, it uses the checksum to select the 
correct mirror during the reading. If you set UN-conditionally the NOCOW 
behavior you lost this capability even if the user _want it_ (and if they spend 
moneys in two or more disks, it is likely they _want it_).

Moreover the NOCOW flags has some "strange" behavior when a NOCOW file is 
snapshotted (it lost the NOCOW property); this may lead to irregular 
performance.

If you want it, it must be configurable at least with a sane default (which 
IMHO should be "do nothing", following the "least surprise" rule).

If you are looking to something like that, I suggest also to defrag the journal 
file before the open (but still as configurable option, and considering the 
"least surprise" rule).

BR
G.Baroncelli

> +if (!(flags&FS_NOCOMP_FL) && s->compress) {
> +flags &= ~FS_COMPR_FL;
> +flags |= FS_NOCOMP_FL;
> +}
> +if (old != flags)
> +ioctl(fd, FS_IOC_SETFLAGS, flags);
> +}
> +    close(fd);
> +}
>  
>  fn = strappenda(fn, "/system.journal");
>  r = journal_file_open_reliably(fn, O_RDWR|O_CREAT, 0640, 
> s->compress, s->seal, &s->system_metrics, s->mmap, NULL, &s->system_journal);
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH] Split sysctl 50-default.conf setting file

2013-12-02 Thread Goffredo Baroncelli
Hi all,

currently systemd contains a sysctl default setting in a file called
50-default.conf
The aim of this patch is to split the content of the sysctl setting in
more files to allow a more selective override.

My need is to enable all the sysrq key. Instead systemd defaults is to
disallow all sysrq keys except the sync one [1].
To do that, I would have to override the sysctl file
/usr/lib/sysctl.d/50-default.conf file,
putting a file with the same name in
/etc/sysctl.d
However this file contains other settings than the one which I want to
override; so I would lost any update of these other settings made by
upstream.

With this patch I am able to override only the setting related to the sysrq.

Please apply.

BR
G.Baroncelli


[1] For the record, I am against this kind of setting. I opened a bug
in debian (#725422), but it was suggested me to send a patch to upstream.
Of course it is in the systemd right to set whatever default it thinks sane.

Signed-off-by: Goffredo Baroncelli 
---
 Makefile.am   |  4 +++-
 sysctl.d/50-coredump.conf.in  |  3 +++
 sysctl.d/50-default.conf  | 24 
 sysctl.d/50-default_fs.conf   | 12 
 sysctl.d/50-default_kernel_sysrq.conf | 26 ++
 sysctl.d/50-default_net.conf  | 14 ++
 6 files changed, 58 insertions(+), 25 deletions(-)
 delete mode 100644 sysctl.d/50-default.conf
 create mode 100644 sysctl.d/50-default_fs.conf
 create mode 100644 sysctl.d/50-default_kernel_sysrq.conf
 create mode 100644 sysctl.d/50-default_net.conf

diff --git a/Makefile.am b/Makefile.am
index 90874df..95087c6 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -357,7 +357,9 @@ dist_zshcompletion_DATA = \
shell-completion/zsh/_systemd
  dist_sysctl_DATA = \
-   sysctl.d/50-default.conf
+   sysctl.d/50-default_kernel_sysrq.conf \
+   sysctl.d/50-default_net.conf \
+   sysctl.d/50-default_fs.conf
  dist_systemunit_DATA = \
units/graphical.target \
diff --git a/sysctl.d/50-coredump.conf.in b/sysctl.d/50-coredump.conf.in
index d5795a3..1db1047 100644
--- a/sysctl.d/50-coredump.conf.in
+++ b/sysctl.d/50-coredump.conf.in
@@ -8,3 +8,6 @@
 # See sysctl.d(5) and core(5) for for details.
  kernel.core_pattern=|@rootlibexecdir@/systemd-coredump %p %u %g %s %t %e
+
+# Append the PID to the core filename
+kernel.core_uses_pid = 1
diff --git a/sysctl.d/50-default.conf b/sysctl.d/50-default.conf
deleted file mode 100644
index 46bae21..000
--- a/sysctl.d/50-default.conf
+++ /dev/null
@@ -1,24 +0,0 @@
-#  This file is part of systemd.
-#
-#  systemd is free software; you can redistribute it and/or modify it
-#  under the terms of the GNU Lesser General Public License as published by
-#  the Free Software Foundation; either version 2.1 of the License, or
-#  (at your option) any later version.
-
-# See sysctl.d(5) and core(5) for for details.
-
-# System Request functionality of the kernel (SYNC)
-kernel.sysrq = 16
-
-# Append the PID to the core filename
-kernel.core_uses_pid = 1
-
-# Source route verification
-net.ipv4.conf.default.rp_filter = 1
-
-# Do not accept source routing
-net.ipv4.conf.default.accept_source_route = 0
-
-# Enable hard and soft link protection
-fs.protected_hardlinks = 1
-fs.protected_symlinks = 1
diff --git a/sysctl.d/50-default_fs.conf b/sysctl.d/50-default_fs.conf
new file mode 100644
index 000..a2e7eb4
--- /dev/null
+++ b/sysctl.d/50-default_fs.conf
@@ -0,0 +1,12 @@
+#  This file is part of systemd.
+#
+#  systemd is free software; you can redistribute it and/or modify it
+#  under the terms of the GNU Lesser General Public License as published by
+#  the Free Software Foundation; either version 2.1 of the License, or
+#  (at your option) any later version.
+
+# See sysctl.d(5) for for details.
+
+# Enable hard and soft link protection
+fs.protected_hardlinks = 1
+fs.protected_symlinks = 1
diff --git a/sysctl.d/50-default_kernel_sysrq.conf
b/sysctl.d/50-default_kernel_sysrq.conf
new file mode 100644
index 000..a848745
--- /dev/null
+++ b/sysctl.d/50-default_kernel_sysrq.conf
@@ -0,0 +1,26 @@
+#  This file is part of systemd.
+#
+#  systemd is free software; you can redistribute it and/or modify it
+#  under the terms of the GNU Lesser General Public License as published by
+#  the Free Software Foundation; either version 2.1 of the License, or
+#  (at your option) any later version.
+
+# See sysctl.d(5) for for details.
+
+# From Documentation/sysrq.txt: possible value to control which sysrq
+# could be invoked from keyboard
+#
+#   0 - disable sysrq completely
+#   1 - enable all functions of sysrq
+#  >1 - bitmask of allowed sysrq functions (see below for detailed function
+#   description):
+#  2 - enable control of console logging level
+#  4 - enable control of keyboard (SAK, unraw)
+#  8 - enable debugging dumps of processes etc.
+# 16 - enable sync comm

Re: [systemd-devel] [PATCH] Split sysctl 50-default.conf setting file

2013-12-02 Thread Goffredo Baroncelli
On 2013-12-02 21:32, Kay Sievers wrote:
> On Mon, Dec 2, 2013 at 9:15 PM, Goffredo Baroncelli  
> wrote:
>> currently systemd contains a sysctl default setting in a file called
>> 50-default.conf
>> The aim of this patch is to split the content of the sysctl setting in
>> more files to allow a more selective override.
>>
>> My need is to enable all the sysrq key. Instead systemd defaults is to
>> disallow all sysrq keys except the sync one [1].
>> To do that, I would have to override the sysctl file
>> /usr/lib/sysctl.d/50-default.conf file,
>> putting a file with the same name in
>> /etc/sysctl.d
>> However this file contains other settings than the one which I want to
>> override; so I would lost any update of these other settings made by
>> upstream.
>>
>> With this patch I am able to override only the setting related to the sysrq.
> 
> You should be able to overwrite individual settings just fine. I don't
> think this is needed.

What happens if the same sysctl is present is in two files: the value is
written two times, or systemd-sysctl is smart enough to write only the
last one ?

I have to point out that I spent some time to find who changed this
setting when I installed systemd. A more explicit name file would helped.

> 
>> create mode 100644 sysctl.d/50-default_kernel_sysrq.conf
> 
> We usually don't do "_" in file names. :)

Just for curiosity: there is a rationale or it is a convention (I am
fine with removing "_", but I am curious about the reason)
> 
> Kay
> ___
> systemd-devel mailing list
> systemd-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/systemd-devel
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] Split sysctl 50-default.conf setting file

2013-12-02 Thread Goffredo Baroncelli
Hi Zbyszek
On 2013-12-02 23:27, Zbigniew Jędrzejewski-Szmek wrote:
> On Mon, Dec 02, 2013 at 10:27:45PM +0100, Goffredo Baroncelli wrote:
[...]
>>
>> Yes am doing so. But IIRC the process order of the sysctl file was
>> inverted near systemd 207...
>>
>> Because Debian uses 204, when it switches to something more recent than
>> 207 this setup will not work any more :-( so I have to change the order
>> number.
> Yes, that unfortunate :), but easy to work around: just install the file
> with a high number, and symlink with a low number. The symlink can be removed
> after update to 208.

Thanks, good suggestions
> 
>> Anyway I think that it is more clean to separate the setting in more files.
> This would make the number of files equal to the number of settings we are
> changing, which would be messy.

This is not the first case that a config file is split in several
sub-files. The .d directories are a typical example.

I have ne question: what happens if a sysctl setting is in more than
one file ? systemd-sysctl is smart enough to write the last value or
 perform several writes ?


>>> BTW, Kay, why is the default so conservative here (sysrq only)?
>>> I would think that the general principle that the user who has physical
>>> access to the machine and can flip the power switch should be able to
>>> do various things which are disruptive, but not are not proviledge
>>> escalation (let's call them reboot-like).
>>
>> I agree with you
> Kay explained in IRC that we do not allow such actions, because access to
> the keyboad doesn't mean full access to the machine, and we default to safe
> settings. Allowing the reboot though logind is different, because the user
> must authenticate first to open a session.

Sorry, but I cannot agree: from a theoretical point of view Kay has
reason. However who has access to the keyboard and not to the "power
switch" ? If I want to switch the PC and the software cannot allow it, I
unplug the main power...

I think that we should give access to other keys like:
- Boot
- Reboot
- powerOff
- Umount

- often my Xorg freez and syrq-K is also useful

Goffredo

> Zbyszek
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH] systemd crashes if locale.conf contains invalid utf8 string

2014-02-06 Thread Goffredo Baroncelli
In the parse_env_file_push() and load_env_file_push() functions, there
are two assert() call to check if the key or value parameters are utf8 valid.

If the strings aren't utf8 valid, assert does abort.

These function are used early by systemd to parse some files. For 
example '/etc/locale.conf'. In my case this file contained a not utf8
sequence, which is bad, but systemd crashed during the boot, which
is even worse !

The enclosed patch removes the assert and return -EINVAL if the
sequence is invalid. This is possible because the caller of these
function [1] checks the errors.
So the check of an invalid utf8 sequence is still performed, but
systemd doesn't crash anymore and logs the error.

BR
G.Baroncelli



[1] parse_env_file_internal(), invoked by load_env_file() and
parse_env_file()

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5



diff --git a/src/shared/fileio.c b/src/shared/fileio.c
index ede8819..38af34b 100644
--- a/src/shared/fileio.c
+++ b/src/shared/fileio.c
@@ -534,35 +534,41 @@ fail:
 
 static int parse_env_file_push(const char *filename, unsigned line,
const char *key, char *value, void *userdata) {
-assert(utf8_is_valid(key));
 
-if (value && !utf8_is_valid(value))
+const char *k;
+va_list* ap = (va_list*) userdata;
+va_list aq;
+
+if (!utf8_is_valid(key)) {
+log_error("%s:%u: invalid UTF-8 for key '%s', ignoring.",
+  filename, line, key);
+return -EINVAL;
+}
+
+if (value && !utf8_is_valid(value)) {
 /* FIXME: filter UTF-8 */
-log_error("%s:%u: invalid UTF-8 for key %s: '%s', ignoring.",
+log_error("%s:%u: invalid UTF-8 value for key %s: '%s', 
ignoring.",
   filename, line, key, value);
-else {
-const char *k;
-va_list* ap = (va_list*) userdata;
-va_list aq;
+return -EINVAL;
+}
 
-va_copy(aq, *ap);
+va_copy(aq, *ap);
 
-while ((k = va_arg(aq, const char *))) {
-char **v;
+while ((k = va_arg(aq, const char *))) {
+char **v;
 
-v = va_arg(aq, char **);
+v = va_arg(aq, char **);
 
-if (streq(key, k)) {
-va_end(aq);
-free(*v);
-*v = value;
-return 1;
-}
+if (streq(key, k)) {
+va_end(aq);
+free(*v);
+*v = value;
+return 1;
 }
-
-va_end(aq);
 }
 
+va_end(aq);
+
 free(value);
 return 0;
 }
@@ -586,26 +592,31 @@ int parse_env_file(
 
 static int load_env_file_push(const char *filename, unsigned line,
   const char *key, char *value, void *userdata) {
-assert(utf8_is_valid(key));
+char ***m = userdata;
+char *p;
+int r;
 
-if (value && !utf8_is_valid(value))
+if (!utf8_is_valid(key)) {
+log_error("%s:%u: invalid UTF-8 for key '%s', ignoring.",
+  filename, line, key);
+return -EINVAL;
+}
+
+if (value && !utf8_is_valid(value)) {
 /* FIXME: filter UTF-8 */
-log_error("%s:%u: invalid UTF-8 for key %s: '%s', ignoring.",
+log_error("%s:%u: invalid UTF-8 value for key %s: '%s', 
ignoring.",
   filename, line, key, value);
-else {
-char ***m = userdata;
-char *p;
-int r;
+return -EINVAL;
+}
 
-p = strjoin(key, "=", strempty(value), NULL);
-if (!p)
-return -ENOMEM;
+p = strjoin(key, "=", strempty(value), NULL);
+if (!p)
+return -ENOMEM;
 
-r = strv_push(m, p);
-if (r < 0) {
-free(p);
-return r;
-}
+r = strv_push(m, p);
+if (r < 0) {
+free(p);
+return r;
 }
 
 free(value);


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] systemd crashes if locale.conf contains invalid utf8 string

2014-02-08 Thread Goffredo Baroncelli
On 02/08/2014 07:15 PM, Zbigniew Jędrzejewski-Szmek wrote:
> On Thu, Feb 06, 2014 at 07:09:59PM +0100, Goffredo Baroncelli wrote:
[...]
> Applied.

Great !

[...]

> Zbyszek
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel