Re: F24: systemd fails to mount 128 LVM partitions. (udev issue?)

2016-07-28 Thread Zbigniew Jędrzejewski-Szmek
On Thu, Jul 28, 2016 at 03:41:38PM +0300, Gilboa Davara wrote:
> On Thu, Jul 28, 2016 at 3:23 PM, Tomasz Torcz  wrote:
> > On Thu, Jul 28, 2016 at 03:04:12PM +0300, Gilboa Davara wrote:
> >> On Thu, Jul 28, 2016 at 6:07 AM, Zbigniew Jędrzejewski-Szmek
> >>  wrote:
> >> >
> >> > It is possible that udevd is failing for whatever reason... but apart
> >> > from the fact that some of the devices links are missing you don't
> >> > provide any info. At the minimum: boot logs, and information which links
> >> > are missing.
> >>
> >> Boot log info is pretty scarce. I only see a lot of systemd timed-out
> >> log messages:
> >> E.g.
> >>
> >> systemd[1]: dev-VolRoot-LogStorageMData_P123.device: Job
> >> dev-VolRoot-LogStorageMData_P123.device/start timed out.
> >> systemd[1]: dev-VolRoot-LogStorageMData_P125.device: Job
> >> dev-VolRoot-LogStorageMData_P125.device/start timed out.
> >> systemd[1]: dev-VolRoot-LogStorageMData_P124.device: Job
> >> dev-VolRoot-LogStorageMData_P124.device/start timed out.
> >> systemd[1]: dev-VolRoot-LogStorageMData_P122.device: Job
> >> dev-VolRoot-LogStorageMData_P122.device/start timed out.
> >> systemd[1]: dev-VolRoot-LogStorageMData_P127.device: Job
> >> dev-VolRoot-LogStorageMData_P127.device/start timed out.
> >> systemd[1]: dev-VolRoot-LogStorageMData_P123.device: Job
> >> dev-VolRoot-LogStorageMData_P123.device/start timed out.
> >> systemd[1]: dev-VolRoot-LogStorageMData_P126.device: Job
> >> dev-VolRoot-LogStorageMData_P126.device/start timed out.
> >>
> >> I did see some udev / LVM error messages:
> >> systemd-udevd[2181]: fork of '/usr/sbin/dmsetup splitname
> >> --nameprefixes --noheadings --rows VolRoot-LogStorageMData_P96'
> >> failed: Resource temporarily unavailable
> >
> > Hm, do you have TasksMax=infinity in systemd-udevd.service?
> > (commit de2edc008a612e152f0690d5063d53001c4e13ff)
> >
> 
> Nope. I don't see TasksMax in systemd-udevd.service.
systemd added a default limit for TasksMax.
TasksMax=infinity undoes that limit for systemd-udev.
I now built //koji.fedoraproject.org/koji/taskinfo?taskID=15052388
with that patch applied for F24.

> Actually, funny that you mention it. (Maybe the following will help others)
> I use the server to run some type of proprietary service that spawns
> ~2000+ process. This service stopped working the second I switched to
> F24.
> After spending a couple of hours banging my head against the wall, I
> noticed that systemd is ignoring the service user's limits.d nproc
> limit (also the service unit's LimitNPROC limit) preventing it from
> forking more than 256 process.
> Adding TasksMax=infinity to the service's unit solved the problem.
> 
> Related?
Most likely.

Zbyszek
--
devel mailing list
devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/devel@lists.fedoraproject.org


Re: F24: systemd fails to mount 128 LVM partitions. (udev issue?)

2016-07-28 Thread Gilboa Davara
On Thu, Jul 28, 2016 at 3:23 PM, Tomasz Torcz  wrote:
> On Thu, Jul 28, 2016 at 03:04:12PM +0300, Gilboa Davara wrote:
>> On Thu, Jul 28, 2016 at 6:07 AM, Zbigniew Jędrzejewski-Szmek
>>  wrote:
>> >
>> > It is possible that udevd is failing for whatever reason... but apart
>> > from the fact that some of the devices links are missing you don't
>> > provide any info. At the minimum: boot logs, and information which links
>> > are missing.
>>
>> Boot log info is pretty scarce. I only see a lot of systemd timed-out
>> log messages:
>> E.g.
>>
>> systemd[1]: dev-VolRoot-LogStorageMData_P123.device: Job
>> dev-VolRoot-LogStorageMData_P123.device/start timed out.
>> systemd[1]: dev-VolRoot-LogStorageMData_P125.device: Job
>> dev-VolRoot-LogStorageMData_P125.device/start timed out.
>> systemd[1]: dev-VolRoot-LogStorageMData_P124.device: Job
>> dev-VolRoot-LogStorageMData_P124.device/start timed out.
>> systemd[1]: dev-VolRoot-LogStorageMData_P122.device: Job
>> dev-VolRoot-LogStorageMData_P122.device/start timed out.
>> systemd[1]: dev-VolRoot-LogStorageMData_P127.device: Job
>> dev-VolRoot-LogStorageMData_P127.device/start timed out.
>> systemd[1]: dev-VolRoot-LogStorageMData_P123.device: Job
>> dev-VolRoot-LogStorageMData_P123.device/start timed out.
>> systemd[1]: dev-VolRoot-LogStorageMData_P126.device: Job
>> dev-VolRoot-LogStorageMData_P126.device/start timed out.
>>
>> I did see some udev / LVM error messages:
>> systemd-udevd[2181]: fork of '/usr/sbin/dmsetup splitname
>> --nameprefixes --noheadings --rows VolRoot-LogStorageMData_P96'
>> failed: Resource temporarily unavailable
>
> Hm, do you have TasksMax=infinity in systemd-udevd.service?
> (commit de2edc008a612e152f0690d5063d53001c4e13ff)
>

Nope. I don't see TasksMax in systemd-udevd.service.

Actually, funny that you mention it. (Maybe the following will help others)
I use the server to run some type of proprietary service that spawns
~2000+ process. This service stopped working the second I switched to
F24.
After spending a couple of hours banging my head against the wall, I
noticed that systemd is ignoring the service user's limits.d nproc
limit (also the service unit's LimitNPROC limit) preventing it from
forking more than 256 process.
Adding TasksMax=infinity to the service's unit solved the problem.

Related?

- Gilboa
--
devel mailing list
devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/devel@lists.fedoraproject.org


Re: F24: systemd fails to mount 128 LVM partitions. (udev issue?)

2016-07-28 Thread Tomasz Torcz
On Thu, Jul 28, 2016 at 03:04:12PM +0300, Gilboa Davara wrote:
> On Thu, Jul 28, 2016 at 6:07 AM, Zbigniew Jędrzejewski-Szmek
>  wrote:
> >
> > It is possible that udevd is failing for whatever reason... but apart
> > from the fact that some of the devices links are missing you don't
> > provide any info. At the minimum: boot logs, and information which links
> > are missing.
> 
> Boot log info is pretty scarce. I only see a lot of systemd timed-out
> log messages:
> E.g.
> 
> systemd[1]: dev-VolRoot-LogStorageMData_P123.device: Job
> dev-VolRoot-LogStorageMData_P123.device/start timed out.
> systemd[1]: dev-VolRoot-LogStorageMData_P125.device: Job
> dev-VolRoot-LogStorageMData_P125.device/start timed out.
> systemd[1]: dev-VolRoot-LogStorageMData_P124.device: Job
> dev-VolRoot-LogStorageMData_P124.device/start timed out.
> systemd[1]: dev-VolRoot-LogStorageMData_P122.device: Job
> dev-VolRoot-LogStorageMData_P122.device/start timed out.
> systemd[1]: dev-VolRoot-LogStorageMData_P127.device: Job
> dev-VolRoot-LogStorageMData_P127.device/start timed out.
> systemd[1]: dev-VolRoot-LogStorageMData_P123.device: Job
> dev-VolRoot-LogStorageMData_P123.device/start timed out.
> systemd[1]: dev-VolRoot-LogStorageMData_P126.device: Job
> dev-VolRoot-LogStorageMData_P126.device/start timed out.
> 
> I did see some udev / LVM error messages:
> systemd-udevd[2181]: fork of '/usr/sbin/dmsetup splitname
> --nameprefixes --noheadings --rows VolRoot-LogStorageMData_P96'
> failed: Resource temporarily unavailable

Hm, do you have TasksMax=infinity in systemd-udevd.service?
(commit de2edc008a612e152f0690d5063d53001c4e13ff)

-- 
Tomasz TorczTo co nierealne -- tutaj jest normalne.
xmpp: zdzich...@chrome.pl  Ziomale na życie mają tu patenty specjalne.
--
devel mailing list
devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/devel@lists.fedoraproject.org


Re: F24: systemd fails to mount 128 LVM partitions. (udev issue?)

2016-07-28 Thread Gilboa Davara
On Thu, Jul 28, 2016 at 9:59 AM, Lennart Poettering
 wrote:
> On Wed, 27.07.16 21:35, Gilboa Davara (gilb...@gmail.com) wrote:
>
>> Hello all,
>>
>> I need help trying to debug a weird bug that I'm hitting.
>> I've got a server with fairly large storage (>100TB) that needs to
>> handle very-small-files.
>> Due to performance considerations I decided to split the large array
>> into 128 ext4 partitions (rather than use a single xfs partition).
>>
>> I recently upgraded the server to F24 (w/ kernel 4.5.5, 4.6.4 refuses
>> to boot on the machine) and I'm now facing a weird problem: On boot,
>> systemd fails to mount all the partition dropping to emergency shell.
>>
>> At least as far as I can see, udev fails to create some symbolic links
>> under /dev/, even though it has no issues creating the same
>> symbolic links under /dev/mapper/-_PXX.
>> On the other hand systemd still uses the broken /dev/
>> device units, even though we moved all the entries in fstab to
>> /dev/mapper/-_PXX and manually ran
>> systemd-fstab-generator.
>
> LVM questions are best directed to the LVM people, we have very little
> experience with that and the LVM ruleset is quite invasively altering
> the udev logic.
>
> Lennart
>


Lennart,

Seems that its indeed lvm related.
(Boot log attached to previous reply mail)

Thanks,
Gilboa
--
devel mailing list
devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/devel@lists.fedoraproject.org


Re: F24: systemd fails to mount 128 LVM partitions. (udev issue?)

2016-07-28 Thread Gilboa Davara
On Thu, Jul 28, 2016 at 6:07 AM, Zbigniew Jędrzejewski-Szmek
 wrote:
>
> It is possible that udevd is failing for whatever reason... but apart
> from the fact that some of the devices links are missing you don't
> provide any info. At the minimum: boot logs, and information which links
> are missing.

Boot log info is pretty scarce. I only see a lot of systemd timed-out
log messages:
E.g.

systemd[1]: dev-VolRoot-LogStorageMData_P123.device: Job
dev-VolRoot-LogStorageMData_P123.device/start timed out.
systemd[1]: dev-VolRoot-LogStorageMData_P125.device: Job
dev-VolRoot-LogStorageMData_P125.device/start timed out.
systemd[1]: dev-VolRoot-LogStorageMData_P124.device: Job
dev-VolRoot-LogStorageMData_P124.device/start timed out.
systemd[1]: dev-VolRoot-LogStorageMData_P122.device: Job
dev-VolRoot-LogStorageMData_P122.device/start timed out.
systemd[1]: dev-VolRoot-LogStorageMData_P127.device: Job
dev-VolRoot-LogStorageMData_P127.device/start timed out.
systemd[1]: dev-VolRoot-LogStorageMData_P123.device: Job
dev-VolRoot-LogStorageMData_P123.device/start timed out.
systemd[1]: dev-VolRoot-LogStorageMData_P126.device: Job
dev-VolRoot-LogStorageMData_P126.device/start timed out.

I did see some udev / LVM error messages:
systemd-udevd[2181]: fork of '/usr/sbin/dmsetup splitname
--nameprefixes --noheadings --rows VolRoot-LogStorageMData_P96'
failed: Resource temporarily unavailable

Either way, I've attached a boot log.

> Note that just running the generator by hand has no effect. You need
> systemctl daemon-reload to reload the units, but that will re-run the
> generators by itself.

Just to be certain: Changing fstab requires systemctl daemon-reload?

>
> I'd suggest commenting out (or adding "noauto") the mount points in 
> /etc/fstab,
> (and of course also disabling any units which make use of them if they are
> not conditionalized), and debugging in a booted system. It's most likely to
> be easier this way.
>

The attached log seems to suggest this indeed a udev/lvm issue.
Thanks for taking the time to help :)

- Gilboa


boot.log.bz2
Description: BZip2 compressed data
--
devel mailing list
devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/devel@lists.fedoraproject.org


Re: F24: systemd fails to mount 128 LVM partitions. (udev issue?)

2016-07-28 Thread Lennart Poettering
On Wed, 27.07.16 21:35, Gilboa Davara (gilb...@gmail.com) wrote:

> Hello all,
> 
> I need help trying to debug a weird bug that I'm hitting.
> I've got a server with fairly large storage (>100TB) that needs to
> handle very-small-files.
> Due to performance considerations I decided to split the large array
> into 128 ext4 partitions (rather than use a single xfs partition).
> 
> I recently upgraded the server to F24 (w/ kernel 4.5.5, 4.6.4 refuses
> to boot on the machine) and I'm now facing a weird problem: On boot,
> systemd fails to mount all the partition dropping to emergency shell.
> 
> At least as far as I can see, udev fails to create some symbolic links
> under /dev/, even though it has no issues creating the same
> symbolic links under /dev/mapper/-_PXX.
> On the other hand systemd still uses the broken /dev/
> device units, even though we moved all the entries in fstab to
> /dev/mapper/-_PXX and manually ran
> systemd-fstab-generator.

LVM questions are best directed to the LVM people, we have very little
experience with that and the LVM ruleset is quite invasively altering
the udev logic.

Lennart

-- 
Lennart Poettering, Red Hat
--
devel mailing list
devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/devel@lists.fedoraproject.org


Re: F24: systemd fails to mount 128 LVM partitions. (udev issue?)

2016-07-27 Thread Zbigniew Jędrzejewski-Szmek
On Wed, Jul 27, 2016 at 09:35:22PM +0300, Gilboa Davara wrote:
> Hello all,
> 
> I need help trying to debug a weird bug that I'm hitting.
> I've got a server with fairly large storage (>100TB) that needs to
> handle very-small-files.
> Due to performance considerations I decided to split the large array
> into 128 ext4 partitions (rather than use a single xfs partition).
> 
> I recently upgraded the server to F24 (w/ kernel 4.5.5, 4.6.4 refuses
> to boot on the machine) and I'm now facing a weird problem: On boot,
> systemd fails to mount all the partition dropping to emergency shell.
> 
> At least as far as I can see, udev fails to create some symbolic links
> under /dev/, even though it has no issues creating the same
> symbolic links under /dev/mapper/-_PXX.
> On the other hand systemd still uses the broken /dev/
> device units, even though we moved all the entries in fstab to
> /dev/mapper/-_PXX and manually ran
> systemd-fstab-generator.
> 
> Valid mapper:
> $ ls -l /dev/mapper/VolRoot-LogStorageMData_P* | wc -l
> 128
> 
> Invalid VGName:
> $ ls -l /dev/VolRoot/LogStorageMData_P* | wc -l
> 95 <--- Should be 128.
> 
> fstab:
> $ cat /etc/fstab | grep VolRoot-LogStorageMData_P | wc -l
> 128
> 
> systemd broken unit files:
> $ systemctl -a --no-pager | /bin/grep dev-VolRoot-LogStorageMData | wc -l
> 95 <--- Should be 128.

It is possible that udevd is failing for whatever reason... but apart
from the fact that some of the devices links are missing you don't
provide any info. At the minimum: boot logs, and information which links
are missing.

Note that just running the generator by hand has no effect. You need
systemctl daemon-reload to reload the units, but that will re-run the
generators by itself.

I'd suggest commenting out (or adding "noauto") the mount points in /etc/fstab,
(and of course also disabling any units which make use of them if they are
not conditionalized), and debugging in a booted system. It's most likely to
be easier this way.

Zbyszek
--
devel mailing list
devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/devel@lists.fedoraproject.org


F24: systemd fails to mount 128 LVM partitions. (udev issue?)

2016-07-27 Thread Gilboa Davara
Hello all,

I need help trying to debug a weird bug that I'm hitting.
I've got a server with fairly large storage (>100TB) that needs to
handle very-small-files.
Due to performance considerations I decided to split the large array
into 128 ext4 partitions (rather than use a single xfs partition).

I recently upgraded the server to F24 (w/ kernel 4.5.5, 4.6.4 refuses
to boot on the machine) and I'm now facing a weird problem: On boot,
systemd fails to mount all the partition dropping to emergency shell.

At least as far as I can see, udev fails to create some symbolic links
under /dev/, even though it has no issues creating the same
symbolic links under /dev/mapper/-_PXX.
On the other hand systemd still uses the broken /dev/
device units, even though we moved all the entries in fstab to
/dev/mapper/-_PXX and manually ran
systemd-fstab-generator.

Valid mapper:
$ ls -l /dev/mapper/VolRoot-LogStorageMData_P* | wc -l
128

Invalid VGName:
$ ls -l /dev/VolRoot/LogStorageMData_P* | wc -l
95 <--- Should be 128.

fstab:
$ cat /etc/fstab | grep VolRoot-LogStorageMData_P | wc -l
128

systemd broken unit files:
$ systemctl -a --no-pager | /bin/grep dev-VolRoot-LogStorageMData | wc -l
95 <--- Should be 128.

Any suggestions are welcome.

- Gilboa
--
devel mailing list
devel@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/devel@lists.fedoraproject.org