Re: Request for testing: Fedora 37 pre-Beta validation tests

2022-08-30 Thread Chris Murphy


On Mon, Aug 29, 2022, at 8:36 PM, Josh Berkus wrote:
> On 8/29/22 17:22, Adam Williamson wrote:
>> It would be really great if we can get the validation tests run now so
>> we can find any remaining blocker bugs in good time to get them fixed.
>> Right now the blocker list looks short, but there are definitely some
>> tests that have not been run.
>
> Last I checked, flatpak was still broken.  Will retest this week.

What's broken with flatpak? I've been using several flatpaks OK since 'dnf 
system-upgrade' a week ago.


-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/cloud@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: do we need Plymouth?

2022-08-09 Thread Chris Murphy


On Tue, Aug 9, 2022, at 11:29 AM, Neal Gompa wrote:

> Plymouth is used to provide the interface for decrypting disks and
> presenting information about software/firmware updates, so I'd be
> loath to remove it.

On desktops yes, but I think we can modify 
systemd-ask-password-plymouth.service so that it omits --plymouth from 
systemd-tty-ask-password-agent?


-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/cloud@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


do we need Plymouth?

2022-08-09 Thread Chris Murphy
cc: cloud@, server@ fpo

Hi,

When troubleshooting early boot issues with a console, e.g. virsh console, or 
the virt-manager console, or even a server's remote management console 
providing a kind of virtual serial console... the boot scroll is completely 
wiped. This is a new behavior in the last, I'm not sure, 6-12 months? 
Everything before about 3 seconds is cleared as if the console reset command 
was used, as in it wipes my local scrollback. 

I captured this with the script command, and when I cat this 76K file, it even 
wipes the local console again. So there is some kind of control character 
that's ordering my local console to do this. The file itself contains the full 
kernel messages. I just can't cat it. I have to open it in a text editor that 
ignores this embedded console reset command.

With the help of @glb, we discovered that this is almost certainly Plymouth. 
When I boot with parameter plymouth.enable=0 the problem doesn't happen. And 
hence the higher level question if we really even need Plymouth in Server or 
Cloud editions?

I suppose ideally we'd track down the problem and fix plymouth, so that 
existing installations get fixed. Whereas if we remove plymouth, we have to 
ponder whether and how to remove plymouth from existing installations. Unless 
we flat out aren't using it at all.

Any ideas? 

Plymouth is in the @core group in fedora-comps, so pretty much everything gets 
it.
https://pagure.io/fedora-comps/blob/main/f/comps-f37.xml.in#_635


--
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/cloud@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: Proposing a new PRD

2022-05-24 Thread Chris Murphy
On Tue, May 24, 2022 at 11:28 AM Duncan  wrote:
>
> Hi everyone.
>
> I have updated the PRD and I would like to request additions comments
> and improvements to this first draft.
>
> 3 Participants
> ==
>
>   Currently the following people are involved in the Cloud Working
>   group.
>
>   - [David Duncan]
>   - [Dusty Mabe]
>   - [Major Hayden]
>   - [Neal Gompa]
>   - [Davida Cavalca]
>   - [Michel Salim]
>   - [Amy Marrich]
>   - [Joe Doss]


Chris Murphy : chrismurphy


-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/cloud@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: [Fedocal] Reminder meeting : Fedora Cloud Workgroup

2021-11-23 Thread Chris Murphy
On Tue, Nov 23, 2021 at 10:00 AM  wrote:
>
> Dear all,
>
> You are kindly invited to the meeting:
>Fedora Cloud Workgroup on 2021-11-25 from 15:00:00 to 16:00:00 UTC
>At fedora-meetin...@irc.libera.chat


Since this is Thanksgiving day in the U.S., it's best to assume this
meeting isn't going to happen. Next meeting scheduled for Dec 9.

-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/cloud@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: Fedora 35 Cloud Base Images for Amazon Public Cloud aarch64 AMIs

2021-11-12 Thread Chris Murphy
On Fri, Nov 12, 2021 at 2:58 PM Dick Marinus  wrote:
>
> Hi,
>
> The list for Fedora 35 Cloud Base Images for Amazon Public Cloud aarch64
> AMIs is empty at:
>
> https://alt.fedoraproject.org/cloud/
>
> Is there a problem building the aarch64 images for AWS, can I be of any
> help?
>
Thanks for the report. I've filed this issue
https://pagure.io/fedora-web/websites/issue/220

Looks like the images are available in AWS, but just aren't listed at
alt.fedoraproject.org/cloud



-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/cloud@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: Fedora Cloud Meeting Minutes 2021-11-11

2021-11-11 Thread Chris Murphy
On Thu, Nov 11, 2021 at 1:21 PM David Duncan  wrote:
>
> Minutes: 
> https://meetbot-raw.fedoraproject.org/teams/fedora_cloud_meeting/fedora_cloud_meeting.2021-11-11-15.00.html
> Minutes (text): 
> https://meetbot-raw.fedoraproject.org/teams/fedora_cloud_meeting/fedora_cloud_meeting.2021-11-11-15.00.txt
> Log: 
> https://meetbot-raw.fedoraproject.org/teams/fedora_cloud_meeting/fedora_cloud_meeting.2021-11-11-15.00.log.html


Sorry I missed the meeting, but yeah +1 to (re)making Cloud Edition
officially as an edition

-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/cloud@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: Fix fallocate issue on cloud-init 19.4 for Fedora 33 cloud images

2021-04-27 Thread Chris Murphy
tl;dr Creating a swapfile with fallocate should work on XFS now.

Found this thread from about 3 years ago
https://www.spinics.net/lists/linux-mm/msg147100.html

There were a few things that needed work to fix it, and I'm not sure
which one finally did it, but this 2019 patch was part of that series:
https://lore.kernel.org/linux-xfs/20191008071527.29304-16-...@lst.de/

That is in 5.7, which is what Fedora 33 shipped with.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/xfs/xfs_aops.c?h=v5.7

Anyway...even though I can't nail down exactly when it got fixed,
today I asked the XFS maintainer about all of this and he said
fallocate'd swapfiles should work. And I also tested it with kernel
5.11.16 and 5.12 and it does work.

--
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/cloud@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: Fix fallocate issue on cloud-init 19.4 for Fedora 33 cloud images

2021-04-26 Thread Chris Murphy
On Mon, Apr 26, 2021 at 8:18 AM Federico Ressi  wrote:
>
> Hello all,
>
> I am writing to this list because I found out F33 cloud image cloud init 
> support for creating swap files looks to be broken probably because of a 
> known Linux 5.7+ kernel issue [1].
>
> The problem is cloud-init is trying to create a new swap file by using 
> fallocate command that is not working well (kernel is complaining the file 
> has holes when executing swapon command just later). The easy workaround for 
> this issue is to use dd command instead of fallocate command in cloud-init.

fallocate default mode is zero, and doesn't create holes. If there are
holes, it's a kernel bug, and it needs to be fixed and kernel updated.
It's also worth making sure cloud-init is using fallocate's default
mode of zero.

The simplest work around is to just create a swap partition instead of
a swapfile, when using cloud images that have the buggy kernel. Or
alternatively don't create either one, and instead write a config to
/etc/systemd/zram-generator.conf so the installation uses swap on a
compressed zram device.

> Because I don't know the whole procedure required for submitting a patch I am 
> writing to you in the hope you can help me in having the F33 cloud image 
> fixed.

I don't think cloud images get reissued after release. But you can
reproduce that process and make your own cloud image that's identical
to the Fedora one, except for the kernel. That way you can use a newer
kernel that doesn't have the bug.

Or like Dusty suggests, give the Fedora 34 Cloud images a whirl. Out tomorrow!

-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/cloud@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: preview of swap on ZRAM feature/change

2020-06-02 Thread Chris Murphy
On Mon, Jun 1, 2020 at 2:55 PM Michael Hall  wrote:
>
> I joined this list because I'm interested in learning more about the specific 
> requirements and features of cloud images.
>
> So I'm wondering what ZRAM adds or what problem ZRAM solves for a cloud image?

Simplest answer: better utilization of a limited resource, memory.

It does this by making it possible for the kernel to evict anonymous
pages. Therefore avoiding repetitive reclaiming of file pages when
under any sort of memory pressure. And it's faster than swap to disk.
And since it doesn't require a preallocation, if the workload were to
never need swap, it has a nearly zero cost. If you aren't using some
swap, aren't you in a sense overprovisioning memory?

-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/cloud@lists.fedoraproject.org


Re: preview of swap on ZRAM feature/change

2020-06-01 Thread Chris Murphy
On Mon, Jun 1, 2020 at 12:44 PM Simo Sorce  wrote:
>
> On Mon, 2020-06-01 at 10:37 -0600, Chris Murphy wrote:
> > Thanks for the early feedback!
> >
> > On Mon, Jun 1, 2020 at 7:58 AM Stephen Gallagher  
> > wrote:
> > > * Reading through the Change, you write:
> > > "using a ZRAM to RAM ratio of 1:2, and capped† to 4GiB" and then you
> > > talk about examples which are using 50% of RAM as ZRAM. Which is it? A
> > > ratio of 1:2 implies using 33% of RAM as ZRAM.
> >
> > This ratio is just a fraction, part of whole, where RAM is the whole.
> > This convention is used in the zram (package).
> >
> > Note that /dev/zram0 is a virtual block device, similar to the
> > 'lvcreate -V' option for thin volumes, size is a fantasy. And the ZRAM
> > device size is not a preallocation of memory. If the compression ratio
> > 2:1 (i.e. 200%) holds, then a ZRAM device sized to 50% of RAM will not
> > use more than 25% of RAM.
>
> What happen if you can't compress memory at all ?
> Will zram use more memory? Or will it simply become useless (but
> hopefully harmless) churn ?

It is not a no op. There is CPU and memory consumption in this case.
It actually reduces available memory for the workload. I haven't yet
seen this in practice and haven't come up with a synthetic test -
maybe something that just creates a bunch of anonymous pages using
/dev/urandom?

Since the device by default is small, it ends up performance wise
being a no op. You either arrive at the same oom you would have, in
the no swap at all case; or it'll just start spilling over into a
swap-on-disk if you have one still.

I have done quite a lot of testing of the webkit gtk compile case,
where it uses ncpus + 2 for the number of jobs by default, and where
it gets to a point eventually needing up to 1.5 GiB per job. Super
memory hungry.

8 GiB RAM + no swap = this sometimes triggers kernel oomkiller
quickly, but sometimes just sits there for a really long time before
it triggers, it does always eventually trigger. With earlyoom enabled
(default on Workstation) the oom happens faster, usually within 5
minutes.

8 GiB RAM + 8 GiB swap-on-disk = this sometimes but far less often
results in kernel oomkiller trigger; most often it sits in
pageout/pagein for 30+ minutes with a totally frozen GUI. With
earlyoom enabled, is consistently killed inside of 10 minutes.

8 GiB RAM + 8 GiB swap-on-ZRAM = exact reverse: sometimes but less
often results in 30+ minute hangs with frozen GUI, usually results in
kernel oom killer within 5 minutes. With earlyoom enabled consistently
is killed inside of 5 minutes.

8 GiB RAM + 16 GiB swap-on-disk = consistently finishes the compile.

8 GiB RAM + 16 GiB swap-on-ZRAM =  my log doesn't have this test. I
thought I had done it. But I think it's a risky default configuration
because it's if you don't get 2:1 compression and the task really
needs this much RAM, it's not just IO churn like with a disk based
swap. It's memory and CPU, and if it gets wedged in, it's a forced
power off. That's basically where we're at with Workstation edition
before earlyoom, which is not good, but not a huge problem like it is
with servers where you have to send someone to go hit a power button.
In these cases, sshd often will not respond before timeout. So no
sysrq+b unless you have that command pretyped out and ready to hit
enter.

The scenario where you just don't have the budget for the correct
memory for the workload, and you have to use swap contrary to the "in
defense of swap" article referenced in the change? I think it's maybe
better use case for zswap? I don't have tests that conclusively prove
that zswap's LRU basis for eviction from the zswap memory pool to the
disk swap is better than how the kernel deals with two swaps (zram and
disk case). But in theory the LRU basis is smarter.

Making it easier for folks to experiment with this I think is maybe
undersold in the proposal. But the main idea is to convey that the
proposed defaults are safe. Later in the proposal I propose they might
be too safe, with the 4GiB cap. That might be refused in favor of 50%
RAM across the board. But that could be a future enhancement if this
proposal is accepted.




>
> > I'll try to clear this up somehow; probably avoid using the term ratio
> > and just go with fraction/percentage. And also note the use of
> > 'zramctl' to see the actual compression ratio.
> >
> > > * This Change implies the de facto death of hibernation in Fedora.
> > > Good riddance, IMHO. It never worked safely.
> >
> > UEFI Secure Boot put us on this path. There's still no acceptable
> > authenticated encrypted hibernation image scheme, and the SUSE
> > developer working on it told me a few months ago that the status is
> > the same as last year and there's no ETA for 

Re: preview of swap on ZRAM feature/change

2020-06-01 Thread Chris Murphy
Thanks for the early feedback!

On Mon, Jun 1, 2020 at 7:58 AM Stephen Gallagher  wrote:
>
> * Reading through the Change, you write:
> "using a ZRAM to RAM ratio of 1:2, and capped† to 4GiB" and then you
> talk about examples which are using 50% of RAM as ZRAM. Which is it? A
> ratio of 1:2 implies using 33% of RAM as ZRAM.

This ratio is just a fraction, part of whole, where RAM is the whole.
This convention is used in the zram (package).

Note that /dev/zram0 is a virtual block device, similar to the
'lvcreate -V' option for thin volumes, size is a fantasy. And the ZRAM
device size is not a preallocation of memory. If the compression ratio
2:1 (i.e. 200%) holds, then a ZRAM device sized to 50% of RAM will not
use more than 25% of RAM.

I'll try to clear this up somehow; probably avoid using the term ratio
and just go with fraction/percentage. And also note the use of
'zramctl' to see the actual compression ratio.

> * This Change implies the de facto death of hibernation in Fedora.
> Good riddance, IMHO. It never worked safely.

UEFI Secure Boot put us on this path. There's still no acceptable
authenticated encrypted hibernation image scheme, and the SUSE
developer working on it told me a few months ago that the status is
the same as last year and there's no ETA for when he gets the time to
revisit it.


> * Can the upgrade process be made to detect the lack of existing swap
> and not enable the zswap in that case?

(We're not using zswap at all in this implementation. zram!=zswap -
easy mistake.)

I expect in the "already has a swap partition" case, no one will
complain about getting a 2nd swap device that's on /dev/zram0, because
it'll just be faster and "spill over" to the bigger swap-on-disk.

It's the use case where "I explicitly did not create a swap device
because I hate swap thrashing" that I suspect there may be complaints.
We can't detect that sentiment. All we could do is just decide that no
upgrades get the feature.

> Generally, we should probably assume (given existing defaults) that
> anyone who has no swap running chose that explicitly and to change it
> would lead to complaints.

Perhaps.

But I assert their decision is based on both bad information (wrong
assumptions), and prior bad experience. Even if in the end the
decision is to not apply the feature on upgrades, I think it's worth
some arguing to counter wrong assumptions and bad experiences.


> * If you're going to do the Supplements:, you need to do `Supplements:
> fedora-release-common` or you won't get everyone. The `fedora-release`
> package is for non-Edition/Spin installs.

Right. Fixed.

I suggest these three lines in the configuration (I've updated the
change proposal how to test section to include this):

[zram0]
memory-limit = none
zram-fraction = 0.5

There is no cap functionality yet in the generator.


--
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/cloud@lists.fedoraproject.org


preview of swap on ZRAM feature/change

2020-06-01 Thread Chris Murphy
Hi,

This topic has been discussed a couple times on devel@ over the past
year -  related to resource control, and better interactivity in low
memory situations. And now I have a preview of the change proposal
ready.

https://fedoraproject.org/wiki/Changes/SwapOnZRAM

The proposal aims for default partitioning, for all Fedora editions
and spins, to not create a swap-on-disk partition. And instead create
a compression-based RAM disk, called ZRAM, and use that for swap.

Previous conversations with cloud and server folks suggests it's
somewhat common to not have swap at all. Hopefully I can change your
minds. :D Fast swap is good.

I'm confident a one-size-fits all size for the ZRAM device is
possible, as a fraction of RAM, with a max size (cap). This should be
aggressive enough for low memory devices, while also not expending as
much overhead for the systems with a lot of memory. It might be an
option to ship different configurations, if necessary.

There is a test day planned. But I'd like to get solid buy-in from
cloud and server folks before then.

Thanks,

-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/cloud@lists.fedoraproject.org


OOM managers, resource control

2020-01-21 Thread Chris Murphy
Hi,

Workstation working group continues to evaluate oom managers and seek
input from domain experts on the subject. I've come across this video
from the All Systems Go conference, on the larger subject of resource
control. Its server+cloud+container oriented. So I think it speaks
directly to your use cases.

Resource Control (2019) Dan Schatzberg
https://www.youtube.com/watch?v=30i7SamZxRU

I'm looking into setting up a discussion session with Dan, and do some
Q I'll report back when I know more about that.


-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/cloud@lists.fedoraproject.org


Re: earlyoom by default

2020-01-13 Thread Chris Murphy
On Mon, Jan 13, 2020 at 10:51 AM Dusty Mabe  wrote:
>
>
>
> On 1/8/20 5:21 PM, Chris Murphy wrote:
> > On Mon, Jan 6, 2020 at 7:56 PM Dusty Mabe  wrote:
> >>
> >> For cloud at least it's very common to not have swap. I'd argue for servers
> >> you don't want them swapping either but resources aren't quite as elastic 
> >> as
> >> in the cloud so you might not be able to burst resources like you can in 
> >> the cloud.
> >
> > There's also discussion about making oomd a universal solution for
> > this; but I came across this issue asserting PSI (kernel pressure
> > stall information) does not work well without swap.
> > https://github.com/facebookincubator/oomd/issues/80
> >
> > Ignoring whether+what+when a workaround may be found for that, what do
> > you think about always having swap-on-ZRAM enabled in these same
> > environments? The idea there is a configurable size /dev/zram block
> > device (basically a compressible RAM disk) on which swap is created.
> > Based on discussions with anaconda, IoT, Workstation, and systemd
> > folks - I think there's a potential to converge on systemd-zram
> > generator (rust) to do this.
> > https://github.com/systemd/zram-generator
> >
> > Workstation wg is mulling over the idea of dropping separate swap
> > partitions entirely, and using a swap-on-ZRAM device instead; possibly
> > with a dynamically created swapfile for certain use cases like
> > hibernation. So I'm curious if this might have broader appeal, and get
> > systemd-zram generator production ready.
> >
>
>
> Seems like an interesting concept. Since it doesn't require any disk setup
> it's easy to turn it off or configure it I assume.
>
> +1

Yes. My suggestion is to install this generator distribution wide. The
on/off switch is the existence of a configuration file. If there's no
config, the generator is a no op. And it won't run in containers
regardless.

Next, the discussion is whether the distribution default is with
config, or without config. Either way it's overridable.

I think a reasonable universal default would be something like a
zram:RAM ratio of 1:2 or 1:1. And cap it to somewhere around 2-4G.

The rationale:

- Fedora IoT folks use swap on zram by default out of the box (via
zram package, not this zram-generator) for a long time, maybe since
the beginning.

- Upstream zram kernel devs say it's reasonable to go up to 2:1
because compression ratios are about 2:1, but it's pointless to go
above that. Therefore 1:1 is quite conservative. 0.5 is even more
conservative but still useful

- 1:1 is consistent with existing defaults (Anaconda, anyway)

- The cap means systems with a lot of RAM will only use it
incidentally. Any time swap thrashing happens with traditional swap is
IO bound, but becomes CPU bound on a zram device (because of all the
compression/decompression hits). So making it small avoids too much of
that.

- Considers upgrade behavior, where existing traditional swap on a
partition is being used; create the swap on zram device with a high
priority, so it's used first.



-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/cloud@lists.fedoraproject.org


Re: earlyoom by default

2020-01-08 Thread Chris Murphy
On Mon, Jan 6, 2020 at 7:56 PM Dusty Mabe  wrote:
>
> For cloud at least it's very common to not have swap. I'd argue for servers
> you don't want them swapping either but resources aren't quite as elastic as
> in the cloud so you might not be able to burst resources like you can in the 
> cloud.

There's also discussion about making oomd a universal solution for
this; but I came across this issue asserting PSI (kernel pressure
stall information) does not work well without swap.
https://github.com/facebookincubator/oomd/issues/80

Ignoring whether+what+when a workaround may be found for that, what do
you think about always having swap-on-ZRAM enabled in these same
environments? The idea there is a configurable size /dev/zram block
device (basically a compressible RAM disk) on which swap is created.
Based on discussions with anaconda, IoT, Workstation, and systemd
folks - I think there's a potential to converge on systemd-zram
generator (rust) to do this.
https://github.com/systemd/zram-generator

Workstation wg is mulling over the idea of dropping separate swap
partitions entirely, and using a swap-on-ZRAM device instead; possibly
with a dynamically created swapfile for certain use cases like
hibernation. So I'm curious if this might have broader appeal, and get
systemd-zram generator production ready.


-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/cloud@lists.fedoraproject.org


earlyoom by default

2020-01-06 Thread Chris Murphy
Hi server@ and cloud@ folks,

There is a system-wide change to enable earlyoom by default on Fedora
Workstation. It came up in today's Workstation working group meeting
that I should give you folks a heads up about opting into this change.

Proposal
https://fedoraproject.org/wiki/Changes/EnableEarlyoom
Devel@ discussion
https://lists.fedoraproject.org/archives/list/de...@lists.fedoraproject.org/message/YXDODS3G4YCS7MT4J2QJMJ7EXCVR7NQ2/

The main issue on a workstation, heavy swap leading to an unresponsive
system, is perhaps not as immediately frustrating on a server.  But
the consequences of indefinite hang or the kernel oom-killer
triggering, which is a SIGKILL, are perhaps worse.

On the plus side, earlyoom is easy to understand, and its first
attempt is a SIGTERM rather than SIGKILL. It uses oom_score, same as
kernel oom-killer, to determine the victim.

The SIGTERM is issued to the process with the highest oom_score only
if both memory and swap reach 10% free. And SIGKILL is issued to the
process with the highest oom_score once memory and swap reach 5% free.
Those percentages can be tweaked, but the KILL percentage is always
1/2 of the TERM  percentage, so it's a bit rudimentary.

One small concern I have is, what if there's no swap? That's probably
uncommon for servers, but I'm not sure about cloud. But in this case,
SIGTERM happens at 10% of RAM, which leaves a lot of memory on the
table, and for a server with significant resources it's probably too
high. What about 4%? Maybe still too high? One option I'm thinking of
is a systemd conditional that would not run earlyoom on systems
without a swap device, which would leave these systems no worse off
than they are right now. [i.e. they eventually recover (?),
indefinitely hang (likely), or oom-killer finally kills something
(less likely).]

I've been testing earlyoom, nohang, and the kernel oom-killer for > 6
months now, and I think it would be completely sane for Server and
Cloud products to enable earlyoom by default for fc32, while
evaluating other solutions that can be more server oriented (e.g.
nohang, oomd, possibly others) for fc33/fc34. What is clear: this
isn't going to be solved by kernel folks, the kernel oom-killer only
cares about keeping the kernel alive, it doesn't care about user space
at all.

In the cases where this becomes a problem, either the kernel hangs
indefinitely or does SIGKILL for your database or whatever is eating
up resources. Whereas at least earlyoom's first attempt is a SIGTERM
so it has a chance of gracefully quitting.

There are some concerns, those are in the devel@ thread, and I expect
they'll be adequately addressed or the feature will not pass the FESCo
vote. But as a short term solution while evaluating more sophisticated
solutions, I think this is a good call so I thought I'd just mention
it, in case you folks want to be included in the change.


-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/cloud@lists.fedoraproject.org


Re: F32, enable fstrim.timer by default

2019-12-18 Thread Chris Murphy
On Wed, Dec 18, 2019 at 5:32 AM Martin Kolman  wrote:

> This will also trim thin LVs on thin pools (if any), right ?
>
> So not just hardware, it can even make "software" storage layouts faster
> & potentially even avoid pool exhaustion in some cases. :)

Just a reminder, the underlying unit, fstrim.service, uses 'fstrim
--fstab' so only fstab file systems are affected. The user would need
to change the unit file to use --all instead of --fstab to affect all
mounted file systems. I'll include that info in the change wiki. I
imagine the best practice is to copy the original unit file, edit it,
and use it as a drop in unit file in /etc ?

Unfinished change, still in progress...
https://fedoraproject.org/wiki/Changes/EnableFSTrimTimer




-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/cloud@lists.fedoraproject.org


Re: F32, enable fstrim.timer by default

2019-12-18 Thread Chris Murphy
On Wed, Dec 18, 2019 at 11:10 AM Neal Gompa  wrote:
>
> On Wed, Dec 18, 2019 at 12:08 PM Chris Murphy  wrote:
> >
> > On Wed, Dec 18, 2019 at 10:56 AM Chris Murphy  
> > wrote:
> > >
> > > One thing I see in the Ubuntu fstrim.service unit file that I'm not
> > > seeing in the Fedora fstrim.service unit file, is a conditional for
> > > containers (line 4). I'm not sure where to ask about that. Maybe
> > > upstream systemd?
> >
> > Found it. I'm not sure if util-linux would typically be found in a
> > container base image? Probably no point in calling fstrim in that
> > case, but also doesn't hurt.
> >
> > $ sudo dnf provides /usr/lib/systemd/system/fstrim.timer
> > util-linux-2.34-3.fc31.x86_64 : A collection of basic system utilities
> > Repo: @System
> > Matched from:
> > Filename: /usr/lib/systemd/system/fstrim.timer
> >
>
> Yeah, util-linux is pretty common in some types of containers. It
> probably makes sense to send a PR to add that conditional.

Appears to be in the upstream version already.

-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/cloud@lists.fedoraproject.org


Re: F32, enable fstrim.timer by default

2019-12-18 Thread Chris Murphy
On Wed, Dec 18, 2019 at 10:56 AM Chris Murphy  wrote:
>
> One thing I see in the Ubuntu fstrim.service unit file that I'm not
> seeing in the Fedora fstrim.service unit file, is a conditional for
> containers (line 4). I'm not sure where to ask about that. Maybe
> upstream systemd?

Found it. I'm not sure if util-linux would typically be found in a
container base image? Probably no point in calling fstrim in that
case, but also doesn't hurt.

$ sudo dnf provides /usr/lib/systemd/system/fstrim.timer
util-linux-2.34-3.fc31.x86_64 : A collection of basic system utilities
Repo: @System
Matched from:
Filename: /usr/lib/systemd/system/fstrim.timer



-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/cloud@lists.fedoraproject.org


Re: F32, enable fstrim.timer by default

2019-12-18 Thread Chris Murphy
readd cloud@ list

On Wed, Dec 18, 2019 at 5:32 AM Martin Kolman  wrote:
>
> On Wed, 2019-12-18 at 13:11 +0100, Martin Pitt wrote:
> > Hello Chris,
> >
> > Chris Murphy [2019-12-17 22:23 -0700]:
> > > This desktop@ thread [1] about a slow device restored by enabling
> > > fstrim.service, got me thinking about enabling fstrim.timer [2] by
> > > default in Fedora Workstation. But I'm curious if it might be
> > > desirable in other Fedora Editions, and making it a system-wide
> > > change?
> >
> > This is a function/property of hardware, so it's IMO not desktop specific at
> > all. Servers suffer just as well from hard disks becoming slower.

> This will also trim thin LVs on thin pools (if any), right ?

Correct.

> So not just hardware, it can even make "software" storage layouts faster
> & potentially even avoid pool exhaustion in some cases. :)

Maybe. I'm not sure either way if someone would actually notice
performance improvements, but it wouldn't make them worse. And yes,
potentially avoid pool exhaustion, in particular in under provisioned
cases.

I forgot to mention: with qemu/kvm/libvirt VM's, the trim would not
get passed down to the backing storage due to default settings. The
discard mode "unmap" is supported with a SCSI disk using virtio SCSI
controller; I see some curious works/doesn't work with "plain" virtio
disk. But when it works, it does pass down to underlying thinp LV and
raw files. If anyone is doing raw file backups, or otherwise paying
for storage, it could save them some coins. And when it doesn't work,
literally nothing happens: the file doesn't get holes punched out, but
there's also no corruption (I test these things with a Btrfs scrub; it
will consistently always complain if a single metadata or data
checksum mismatches).

Anyway, it's an optimization. Pretty well tested elsewhere at this
point. And offhand not aware of any liabilities, but thought I'd ask
about it before writing up a system wide change proposal.

One thing I see in the Ubuntu fstrim.service unit file that I'm not
seeing in the Fedora fstrim.service unit file, is a conditional for
containers (line 4). I'm not sure where to ask about that. Maybe
upstream systemd?

$ cat /lib/systemd/system/fstrim.service
[Unit]
Description=Discard unused blocks on filesystems from /etc/fstab
Documentation=man:fstrim(8)
ConditionVirtualization=!container

[Service]
Type=oneshot
ExecStart=/sbin/fstrim --fstab --verbose --quiet
ProtectSystem=strict
ProtectHome=yes
PrivateDevices=no
PrivateNetwork=yes
PrivateUsers=no
ProtectKernelTunables=yes
ProtectKernelModules=yes
ProtectControlGroups=yes
MemoryDenyWriteExecute=yes
SystemCallFilter=@default @file-system @basic-io @system-service
chris@chris-Standard-PC-Q35-ICH9-2009:~$



-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/cloud@lists.fedoraproject.org


F32, enable fstrim.timer by default

2019-12-17 Thread Chris Murphy
Hi,

This desktop@ thread [1] about a slow device restored by enabling
fstrim.service, got me thinking about enabling fstrim.timer [2] by
default in Fedora Workstation. But I'm curious if it might be
desirable in other Fedora Editions, and making it a system-wide
change?

I've checked recent versions of openSUSE and Ubuntu, and they have it
enabled. Therefore I estimate the likelihood of running into cons
(below) is pretty remote. Most people won't notice anything.

The pros:
+ when passed down to flash drives that support trim, it provides a
hint to the drive firmware about erase blocks ready for erasure. Some
devices will have improved wear leveling and performance as a result,
but this is firmware specific.
LVM [3] and dm-crypt [4] passdown appears to be enabled by default on Fedora.
+ with LVM thin provisioning, it will cause unused LV extents to be
returned to the thin pool for use by other LVs, kind of a nifty work
around for XFS not supporting fs shrink resize.

The gotchas:
+  Few, but highly visible, reports of buggy SSDs that corrupt or lose
data soon after trim being issued. By now, most have been blacklisted
in the kernel, and/or have manufacturer firmware updates. We shouldn't
run into this problem unless someone has older hardware that hasn't
been update and for some reason also hasn't been blacklisted in the
kernel.
+ Older SSD's have only non-queued trim support, which also can result
in a brief hang while the command is processed. This is highly
variable based on the device firmware, and the workload. But using
weekly fstrim is preferred for these devices, instead of using the
discard mount option in /etc/fstab.
+ Possible exposure of fs locality pattern may be a security risk for
some workflows. [4] [5]


[1]
https://lists.fedoraproject.org/archives/list/desk...@lists.fedoraproject.org/message/UHINXYYGEYD727HIUHF3DQ7ZPCZHXWOK/

[2]
fstrim.timer, if enabled, runs fstrim.service weekly, specifically
Monday at midnight local time; and if the system isn't available at
that time, it runs during or very soon after the next startup. The
command:
ExecStart=/usr/sbin/fstrim --fstab --verbose --quiet

fstab means only file systems in fstab are included; verbose reports
the mount point and bytes potentially discarded and is recorded in the
systemd journal; quiet suppresses errors which is typical for file
systems and devices that don't support fstrim, e.g. the EFI System
partition, which is FAT16/32; and USB flash "stick" drives, and hard
drives.

[3]
/etc/lvm/lvm.conf, if I'm reading it correctly, file system discards
are passed down:
# This configuration option has an automatic default value.
# thin_pool_discards = "passdown"

Due to this Fedora 27 feature; trim is passed down by dm-crypt as well
for LUKS volumes. Curiously because Fedora neither sets the discard
mount option for any file system, nor enables fstrim.timer, this
feature isn't being taken advantage of.

[4]
https://fedoraproject.org/wiki/Changes/EnableTrimOnDmCrypt

[5]
Trim on LUKS/dm-crypt note from upstream, section 5.19
https://gitlab.com/cryptsetup/cryptsetup/-/wikis/FrequentlyAskedQuestions#5-security-aspects

-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/cloud@lists.fedoraproject.org


[atomic-wg] Issue #281: Figure out comprehensive strategy for atomic host container storage

2017-07-13 Thread Chris Murphy

chrismurphy added a new comment to an issue you are following:
``
I read a list of problems already with negative arguments, not supplied by me. 
And I've presented something that obviates literally all of them. I see it as 
advice, not debate.
``

To reply, visit the link below or just reply to this email
https://pagure.io/atomic-wg/issue/281
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


[atomic-wg] Issue #281: Figure out comprehensive strategy for atomic host container storage

2017-07-13 Thread Chris Murphy

chrismurphy added a new comment to an issue you are following:
``
Yeah I wasn't considering anything we don't have in anaconda, but then also 
anything not already in the Fedora kernel for some time now.

Plus ZFS lacks fs shrink, and so you can't remove block devices arbitrarily, it 
also lacks online replication and seeding. So I even if it weren't for 
licensing it wouldn't be the direction I'd go in.
``

To reply, visit the link below or just reply to this email
https://pagure.io/atomic-wg/issue/281
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


[atomic-wg] Issue #281: Figure out comprehensive strategy for atomic host container storage

2017-07-13 Thread Chris Murphy

chrismurphy added a new comment to an issue you are following:
``
Ahh sorry, I kinda figured realistically there are only three options: ext4, 
XFS, and Btrfs, and the only one not mentioned so far is Btrfs.

There's some hits of people using it in AWS contexts, but I have not yet run 
across Btrfs + overlayfs. So, I started a thread on linux-bt...@vger.kernel.org 
to see if anyone's using containers with Btrfs + overlayfs. Insofar as I'm 
aware it's not a pathological combination, I'm just gonna guess to the vast 
majority it seems redundant, but they each bring different things to the table.
``

To reply, visit the link below or just reply to this email
https://pagure.io/atomic-wg/issue/281
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


[atomic-wg] Issue #281: Figure out comprehensive strategy for atomic host container storage

2017-07-13 Thread Chris Murphy

chrismurphy added a new comment to an issue you are following:
``
All the partitioning, sizing, and resizing concerns mentioned in this issue 
vanish with a certain other filesystem, which does all resizes (grow, shrink, 
add and remove devices)  online and atomically and typically in a single 
command. Whether scripted or user issued, the commands are shorter, easier to 
understand, complete faster and are safer.

Gotcha though is I haven't used it with overlayfs. A cursory search yields no 
hits. But it seems sane to allow Docker to continue to use overlayfs for the 
shared page cache benefit, and even snapshotting (if Docker supports that 
overlayfs feature now?).

But the main pro is that you can have separate fstrees read-only or read-write 
mounted, but they share the same storage pool, without hard barriers between 
them.

*shrug*
``

To reply, visit the link below or just reply to this email
https://pagure.io/atomic-wg/issue/281
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


[fedora-atomic] Issue #64: wireless firmware not included on ISO installations

2017-06-20 Thread Chris Murphy

chrismurphy reported a new issue against the project: `fedora-atomic` that you 
are following:
``
Version:
I tested this Fedora-Atomic-ostree-x86_64-26-20170619.n.0.iso on an Intel NUC 
(a baremetal installation).

Problem, Actual results:
The installation media and environment has wireless firmware, wireless connects 
fine. But on reboot, no networking, and kernel messages indicates the problem 
is due to firmware not being installed.

Expected results:

Wifi firmware should be included in this ostree repo for baremetal 
installation; or alternatively make it more clear that these images aren't 
intended for baremetal installation.
``

To reply, visit the link below or just reply to this email
https://pagure.io/fedora-atomic/issue/64
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


[atomic-wg] Issue #257 `atomic host tree in F26 does not boot properly`

2017-03-15 Thread Chris Murphy

chrismurphy added a new comment to an issue you are following:
``
OK this seems bad

plymouth-start.service: Executing: /usr/sbin/plymouthd --mode=boot 
--pid-file=/var/run/plymouth/pid --attach-to-session
[3.372107] general protection fault:  [#1] SMP
[3.372654] Modules linked in: virtio_console(+) parport snd_timer 
qemu_fw_cfg snd soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace 
8139too qxl drm_kms_helper ttm crct10dif_pclmul crc32_pclmul crc32c_intel drm 
ghash_clmulni_intel serio_raw virtio_pci virtio_ring 8139cp mii virtio 
ata_generic pata_acpi sunrpc scsi_transport_iscsi
[3.374009] CPU: 1 PID: 667 Comm: systemd-udevd Not tainted 
4.11.0-0.rc1.git0.1.fc26.x86_64 #1
[3.374009] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3-1.fc25 04/01/2014
[3.374009] task: a0b87521 task.stack: af5a40758000
[3.374009] RIP: 0010:vp_modern_find_vqs+0x39/0x70 [virtio_pci]
[3.374009] RSP: 0018:af5a4075ba68 EFLAGS: 00010282
[3.374009] RAX: af5a403ed000 RBX: 2d306f6974726976 RCX: 
[3.374009] RDX: 00fc RSI: af5a403ed01c RDI: 0001
[3.374009] RBP: af5a4075ba88 R08: 0001c8e0 R09: 9341ea29
[3.374009] R10: e08981dd0c40 R11:  R12: a0b834bda708
[3.374009] R13:  R14: a0b834bda400 R15: 001f
[3.374009] FS:  7fd9211438c0() GS:a0b87fd0() 
knlGS:
[3.374009] CS:  0010 DS:  ES:  CR0: 80050033
[3.374009] CR2: 55f2eeb94728 CR3: 7610c000 CR4: 003406e0
[3.374009] Call Trace:
[3.374009]  init_vqs+0x1a0/0x2e0 [virtio_console]
[3.374009]  virtcons_probe+0xb9/0x360 [virtio_console]
[3.374009]  virtio_dev_probe+0x144/0x1e0 [virtio]
[3.374009]  driver_probe_device+0x106/0x450
[3.374009]  __driver_attach+0xa4/0xe0
[3.374009]  ? driver_probe_device+0x450/0x450
[3.374009]  bus_for_each_dev+0x6e/0xb0
[3.374009]  driver_attach+0x1e/0x20
[3.374009]  bus_add_driver+0x1d0/0x270
[3.374009]  ? virtio_cons_early_init+0x1d/0x1d [virtio_console]
[3.374009]  driver_register+0x60/0xe0
[3.374009]  ? virtio_cons_early_init+0x1d/0x1d [virtio_console]
[3.374009]  register_virtio_driver+0x20/0x30 [virtio]
[3.374009]  init+0x9f/0xfe3 [virtio_console]
[3.374009]  do_one_initcall+0x50/0x1a0
[3.374009]  ? free_hot_cold_page+0x19a/0x300
[3.374009]  ? kmem_cache_alloc_trace+0x15f/0x1c0
[3.374009]  ? do_init_module+0x27/0x1e6
[3.374009]  do_init_module+0x5f/0x1e6
[3.374009]  load_module+0x22b7/0x2820
[3.374009]  ? __symbol_put+0x60/0x60
[3.374009]  SYSC_init_module+0x16f/0x1a0
[3.374009]  SyS_init_module+0xe/0x10
[3.374009]  do_syscall_64+0x67/0x170
[3.374009]  entry_SYSCALL64_slow_path+0x25/0x25
[3.374009] RIP: 0033:0x7fd91fda53da
[3.374009] RSP: 002b:7ffdf7f18d38 EFLAGS: 0246 ORIG_RAX: 
00af
[3.374009] RAX: ffda RBX: 55f2eeb6f6a0 RCX: 7fd91fda53da
[3.374009] RDX: 7fd9208da9c5 RSI: b37b RDI: 55f2eeb893a0
[3.374009] RBP: 7fd9208da9c5 R08: 55f2eeb74e80 R09: 0078
[3.374009] R10: 7fd92005fb00 R11: 0246 R12: 55f2eeb893a0
[3.374009] R13: 55f2eeb6c140 R14: 0002 R15: 55f2ede5dfca
[3.374009] Code: 54 53 49 89 fe e8 78 0d 00 00 85 c0 41 89 c5 75 44 49 8b 
9e 08 03 00 00 4d 8d a6 08 03 00 00 4c 39 e3 74 31 49 8b 86 38 03 00 00 <0f> b7 
7b 28 48 8d 70 16 e8 3a e6 15 d3 49 8b 86 38 03 00 00 bf 
[3.374009] RIP: vp_modern_find_vqs+0x39/0x70 [virtio_pci] RSP: 
af5a4075ba68
[3.410405] ---[ end trace a110b926d7e8d96b ]---


``

To reply, visit the link below or just reply to this email
https://pagure.io/atomic-wg/issue/257
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


[atomic-wg] Issue #185 `November 21 ISO is not bootable on UEFI`

2017-01-18 Thread Chris Murphy

chrismurphy added a new comment to an issue you are following:
``
VM install of Fedora-Atomic-ostree-x86_64-25-20170118.1.iso to a clean LV 
succeeds, for both BIOS and UEFI firmware. Using default partitioning, the 
required layout is created. Both installations completely startup, I can login, 
 docker-pool has been created, and docker is running.
``

To reply, visit the link below or just reply to this email
https://pagure.io/atomic-wg/issue/185
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


[atomic-wg] Issue #186 `switch to overlay2`

2017-01-10 Thread Chris Murphy

chrismurphy added a new comment to an issue you are following:
``
Sorry for the confusing report.

docker-root-lv was created automatically when 
/etc/sysconfig/docker-storage-setup contained

>STORAGE_DRIVER=overlay2
DOCKER_ROOT_VOLUME=yes>

Upon stopping docker and issuing atomic storage reset, this LV is removed.

If I don't make changes to /etc/sysconfig/docker-storage-setup then a 
docker-pool LV (which is actually a dm thin pool) is created; and upon stopping 
docker and issuing atomic storage reset, this pool is likewise removed.
``

To reply, visit the link below or just reply to this email
https://pagure.io/atomic-wg/issue/186
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


[atomic-wg] Issue #186 `switch to overlay2`

2017-01-09 Thread Chris Murphy

chrismurphy added a new comment to an issue you are following:
``
>dustymabe 
>Am I missing something? Did I make some bad assumptions somewhere in this test?
Nope, works for me as well. /var is still a directory on the ext4 rootfs, but 
it looks like a new LV Is created at 40% of the free space in the VG, formatted 
XFS, and  var-lib-docker.mount mounts it at /var/lib/docker; that mount file is 
created by the code triggered by DOCKER_ROOT_VOLUME=yes.

I did additionally try a migrate from devicemapper to overlay2 using atomic 
storage export + reset + modify + import and it does work. There is no 
automatic space recapture of the docker-root-lv LV however that could be 
deleted by the user after the modify step, reboot so docker-storage-setup sets 
up the dm-thin pool, and then do the import. I'm assuming in any case that 
there needs to be temp space somewhere for the exported containers.
``

To reply, visit the link below or just reply to this email
https://pagure.io/atomic-wg/issue/186
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


[atomic-wg] Issue #186 `switch to overlay2`

2017-01-09 Thread Chris Murphy

chrismurphy added a new comment to an issue you are following:
``
> dustymabe
> with DOCKER_ROOT_VOLUME and overlayfs using that then all of /var/lib/docker 
> would be taken care of. Please let me know if I'm wrong.

It'll work on a conventional installation. I'm skeptical it'll work on an 
rpm-ostree installation because /var is already a bind mount performed by 
ostree during the startup process. So I'm pretty sure ostree is going to have 
to know about the "true nature" of a separate var partition, mount it, then 
bind mount it correctly.

>I tend to think more about the cloud use case where you spin up a 
>preconfigured image. What I was referring to is having docker-storage-setup be 
>able to make the switch for us.

I don't have a strong opinion on where the proper hinting belongs to indicate 
which driver to use. The user already has to setup #cloud-config so maybe the 
hint belongs in there, and either it does something to storage which is then 
understood by docker-storage-setup, or the hint is just a baton to 
docker-storage-config to do it, just depends on which is more flexible and 
maintainable.

> This means we can essentially look at if the user provided overlay or DM and 
> do whatever they asked.
> - If they provided overlay then we can just extend the root partition and go 
> on our merry way.
> - If they also specified DOCKER_ROOT_VOLUME=yes then they want overlay on 
> another partition, did they specify a partion? yes, use that one. no, create 
> an LV.
> - If they provided DM then create new LVs and set it up just like we have 
> been doing before this discussion started.

Seems reasonable. But I have zero confidence at the moment that ostree can 
handle a separate /var file system; it's a question for Colin what assumptions 
are being made and I think it assumes it's directory that it bind mounts 
somewhere, and if it's really a separate volume, then something has to mount it 
first before it can be bind mounted elsewhere.

An additional trick is testing any changes against Btrfs where mounting 
subvolumes explicitly is actually a bind mount behind the scene. That should 
just work but...
``

To reply, visit the link below or just reply to this email
https://pagure.io/atomic-wg/issue/186
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


[atomic-wg] Issue #186 `switch to overlay2`

2017-01-09 Thread Chris Murphy

chrismurphy added a new comment to an issue you are following:
``

>vgoyal
>IIUC, you are saying that use a thin LV for rootfs to work around xfs shrink 
>issue? People have tried that in the past and there have been talks about that 
>many a times. There are still issues with xfs on top of thin lv and how no 
>space situation is handled etc. Bottom line, we are not there yet.

You mean thin pool exhaustion? Right now the atomic host default uses the 
docker devicemapper driver which is XFS on a dm-thin pool. So I don't 
understand why one is OK and the other isn't.

>So if we can't use rootfs on thin LV and if xfs can't be shrinked, then only 
>way to flip back to devicemapper is don't allow rootfs to use all free space.

When hosted in the cloud, isn't it typical to charge for allocated space 
whether it's actively used or not?


>jberkus
>If that reason is invalid, we should again consider making "one big partition" 
>the default for Overlay2 installations.

Yes. It's the same effort to add more space (partition, LV, raw/qcow2), make it 
an LVM PV, and add to the VG and then let docker-storage-setup create a 
docker-pool thin pool from that extra space.


>dwalsh
>We have tools that allow you to switch back to devicemapper if their is 
>partioning, which is why we want to keep partitioning. If this was easy to 
>switch from no partioning to partitioned, then I would agree with just default 
>to overlay without partitions.

My interpretation of jberkus "one big partition" is a rootfs LV that uses all 
available space in the VG, reserving nothing. But it's still possible to add a 
PV to that VG and either grow rootfs for continued use of overlay2; or to 
fallback to devicemapper. I don't interpret it literally to mean dropping LVM. 
You'd probably want some way of doing online fs resize as an option, and that 
requires rootfs on LVM or Btrfs, not a plain partition.

I think it's a coin toss having this extra space already available in the VG, 
vs expecting the admin to enlarge the backing storage or add an additional 
device, which is then added to the VG, which can then grow rootfs (overlay2) or 
be used as fallback with the Docker devicemapper driver. 

>dustymabe
>I would like to also point out that one other benefit would be to prevent 
>containers from cannibalizing your root partition.

Not possible by making /var a separate file system, you'd have to use quotas. 
Ostree owns /var, it must be a directory on rootfs at present.

>I prefer overlay2 and would like to see there be only one option so that we 
>can have less confusion in the future. However, giving users the choice is 
>nice as well. Maybe there is a way to achieve both on startup.

You could have two kickstarts: overlay2 and devicemapper, and each kickstart is 
specified using a GRUB menu entry on the installation media. The devicemapper 
case uses the existing kickstart and depends on the existing 
docker-storage-setup "use 40% of VG free space for a dm-thin pool"; the 
overlay2 kickstart would cause the installer to use all available space for 
rootfs, leaving no unused space in the VG.


``

To reply, visit the link below or just reply to this email
https://pagure.io/atomic-wg/issue/186
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


[atomic-wg] Issue #186 `switch to overlay2`

2017-01-07 Thread Chris Murphy

chrismurphy added a new comment to an issue you are following:
``
Flipping from one to the other will take free space somewhere for the 'atomic 
storage export/import' operation to temporarily store docker images and 
containers to.

A way around the xfs lack of shrink issue is to put the filesystem containing 
/var onto a thinly provisioned LV (be it a dir on rootfs or its own volume). 
After 'atomic storage reset' wipes the docker storage, issue fstrim, and all 
the previously used extents will be returned to the thin pool, which can then 
be returned to the VG, which can then be reassigned to a new docker thin pool. 
Convoluted in my opinion, but doable.

The problem I'm having migrating from devicemapper to overlay is add /var to 
fstab isn't working. Systemd picks it up, but no mount command is issued. Seems 
like there's a problem making sure it happens after ostree switchroot as 
there's no /var directory prior to the ostree rootfs being setup.
``

To reply, visit the link below or just reply to this email
https://pagure.io/atomic-wg/issue/186
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


[atomic-wg] Issue #186 `switch to overlay2`

2017-01-06 Thread Chris Murphy

chrismurphy added a new comment to an issue you are following:
``
dwalsh mentioned the a way to flip between them
https://www.spinics.net/linux/fedora/fedora-cloud/msg07620.html
Missing from that sequence is actually configuring the new storage if it 
doesn't exist yet.

I think putting custom partitioning into the hands of users, and then 
supporting those arbitrary layouts, is asking for endless trouble. Pick your 
battles, ignore the rest.

The more versatile production solution for dealing with runaway usage of space 
is quotas. But lack of familiarity causes people to keep running back to the 
familiar torture of fs resize and repartitioning. I'm hopeful the storaged and 
Cockpit folks will one day help solve this. Partitioning to solve these 
problems is so last century.
``

To reply, visit the link below or just reply to this email
https://pagure.io/atomic-wg/issue/186
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


Re: Fedora 26 change: using overlayfs as default

2016-12-13 Thread Chris Murphy
On Tue, Dec 13, 2016 at 8:01 AM, Daniel J Walsh <dwa...@redhat.com> wrote:
>
> The only way to change from one storage to the other is to use
>
> atomic storage export
> change the config
> atomic storage reset
> atomic storage import

Nifty.

A migration tool would have to juggle the potential for insufficient
space in /var for the export; or sufficient space for export but then
not importing. And then there's cleanup of otherwise dead space used
by device mapper. So possibly more than one fs resize is necessary.
I'd say probably leave things alone for upgrades, but documenting a
strategy for migrating to overlay is OK.


-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


Re: Fedora 26 change: using overlayfs as default

2016-12-12 Thread Chris Murphy
On Mon, Dec 12, 2016 at 3:13 PM, Josh Berkus <jber...@redhat.com> wrote:
> On 12/12/2016 02:12 PM, Dusty Mabe wrote:
>>
>> After I get a bug[1] fixed and out the door I'm going to publish
>> a blog post/docs on setting up Fedora 25 Atomic host and/or Cloud
>> base to use overlay2 as the storage driver for docker.
>>
>> I'd like for everyone that can to test this out and to start running
>> their container workloads with overlay2 with selinux enabled and let's
>> file bugs and get it cleaned up for Fedora 26 release.
>>
>> Should we file this as a "change" for Fedora 26?
>
> I'd say so, yes.

I suggest it be discussed by all the work groups, on devel@. It might
turn out that Fedora Atomic Host goes first, and there may be some
variation (Atomic Host has no need for LVM although it doesn't hurt,
where Server would almost certainly want to keep it, and Workstation
could flip a coin).


> Also, someone needs to test the case of migrating an existing system and
> how that looks.

It'd need a test for enough free space on /var, which first needs an
estimate of every single container image in the thinly provisioned
storage; stop docker and change the configuration to use overlay
instead of device mapper driver; start docker, import all the tar'd
containers.


-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


[atomic-wg] Issue #185 `November 21 ISO is not bootable`

2016-12-08 Thread Chris Murphy

chrismurphy added a new comment to an issue you are following:
``
[![program.log](/atomic-wg/issue/raw/files/cb8148271c4c88af8e1abebf3d2b725e0f44eaa2ffc65b837c6424c061eb4755-program.log)](/atomic-wg/issue/raw/files/cb8148271c4c88af8e1abebf3d2b725e0f44eaa2ffc65b837c6424c061eb4755-program.log)

[![storage.log](/atomic-wg/issue/raw/files/50959389d3087c198eea954198fb25e775139a28dc23ee0d7f0e38dbc99c6d06-storage.log)](/atomic-wg/issue/raw/files/50959389d3087c198eea954198fb25e775139a28dc23ee0d7f0e38dbc99c6d06-storage.log)

[![anaconda.log](/atomic-wg/issue/raw/files/20f87acfd264c8a61b384500b5a326831ce96b4d661ecef50bab1e290d7a41b1-anaconda.log)](/atomic-wg/issue/raw/files/20f87acfd264c8a61b384500b5a326831ce96b4d661ecef50bab1e290d7a41b1-anaconda.log)
``

To reply, visit the link below or just reply to this email
https://pagure.io/atomic-wg/issue/185
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


[atomic-wg] Issue #185 `November 21 ISO is not bootable`

2016-12-08 Thread Chris Murphy

chrismurphy added a new comment to an issue you are following:
``
Fedora-Atomic-ostree-x86_64-25-20161207.0.iso in virt-manager set to use UEFI; 
and default automatic partitioning.

program.log
12:04:03,606 INFO program: Running... efibootmgr
12:04:03,647 INFO program: EFI variables are not supported on this system.
12:04:03,648 DEBUG program: Return code: 2
12:04:03,648 INFO program: Running... efibootmgr -c -w -L Fedora -d
/dev/vda -p 1 -l \EFI\fedora\shim.efi
12:04:03,658 INFO program: EFI variables are not supported on this system.
12:04:03,659 DEBUG program: Return code: 2
12:04:03,660 INFO program: Running... grub2-mkconfig -o
/boot/efi/EFI/fedora/grub.cfg
12:04:04,055 INFO program: /usr/bin/grub2-editenv: error: cannot
rename the file /boot/grub2/grubenv.new to /boot/grub2/grubenv: No
such file or directory.
12:04:04,056 INFO program: /sbin/grub2-mkconfig: line 247:
/boot/efi/EFI/fedora/grub.cfg.new: No such file or directory
12:04:04,057 DEBUG program: Return code: 1


However, if I get to a vt and run efibootmgr there is no error. So I'm
not sure why anaconda has a problem running it. The last two errors
likewise don't make sense on their own, so to try and reproduce the
problem I tried:

# chroot /mnt/sysimage
chroot: failed to run command '/bin/sh': No such file or directory

Huh. So that usually works on netinstalls and lives. And /bin/sh does
exist, it's a symlink to bash and /bin/bash does exist also. So I'm
still confused.
``

To reply, visit the link below or just reply to this email
https://pagure.io/atomic-wg/issue/185
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


Re: [atomic-wg] Issue #185 `November 21 ISO is not bootable`

2016-12-08 Thread Chris Murphy
Using virt-manager set to use UEFI

program.log
12:04:03,606 INFO program: Running... efibootmgr
12:04:03,647 INFO program: EFI variables are not supported on this system.
12:04:03,648 DEBUG program: Return code: 2
12:04:03,648 INFO program: Running... efibootmgr -c -w -L Fedora -d
/dev/vda -p 1 -l \EFI\fedora\shim.efi
12:04:03,658 INFO program: EFI variables are not supported on this system.
12:04:03,659 DEBUG program: Return code: 2
12:04:03,660 INFO program: Running... grub2-mkconfig -o
/boot/efi/EFI/fedora/grub.cfg
12:04:04,055 INFO program: /usr/bin/grub2-editenv: error: cannot
rename the file /boot/grub2/grubenv.new to /boot/grub2/grubenv: No
such file or directory.
12:04:04,056 INFO program: /sbin/grub2-mkconfig: line 247:
/boot/efi/EFI/fedora/grub.cfg.new: No such file or directory
12:04:04,057 DEBUG program: Return code: 1


However, if I get to a vt and run efibootmgr there is no error. So I'm
not sure why anaconda has a problem running it. The last two errors
likewise don't make sense on their own, so to try and reproduce the
problem I tried:

# chroot /mnt/sysimage
chroot: failed to run command '/bin/sh': No such file or directory


Huh. So that usually works on netinstalls and lives. And /bin/sh does
exist, it's a symlink to bash and /bin/bash does exist also. So I'm
still confused.

Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


Re: List of F26 features from Atomic Working Group

2016-11-27 Thread Chris Murphy
It might be beyond the scope of Fedora 26, but I'd like to evaluate
the liabilities (pros, cons and gotchas) of supporting all possible
user defined layouts (within reason) out of the box. That is, ext4,
XFS, Btrfs, overlay(fs), dm thin. Surely this is a boolean problem,
and the setup just needs to know what's being used, and automagically
do the right thing.

The most obvious flaw with this idea, is the move to overlayfs is
intended to shed the baggage of docker-storage-setup and LVM thin, but
ideally I'd like to see atomic support a few sane (whatever that's
defined to be) layouts and automatically use them, so we can better
figure out what works well, and what works poorly, for various use
cases.

Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


Re: Cloud and Server Q

2016-10-05 Thread Chris Murphy
On Wed, Oct 5, 2016 at 11:57 AM, Josh Berkus <jber...@redhat.com> wrote:
> On 10/04/2016 01:38 PM, Matthew Miller wrote:
>> On Tue, Oct 04, 2016 at 12:58:05PM -0700, Josh Berkus wrote:
>>> What this is sounding like is a huge discrepancy between what the
>>> Council, PRD group, etc. think we should be doing and what we can
>>> actually do.
>>>
>>> Given that, I think I should tell the designer to push the design
>>> changes back.
>>
>> I don't see how that follows. In the ideal — and I think most likely,
>> since the bugs making F25 not work are being knocked off — case, we'll
>> have Atomic built on F25 at F25 GA date. In the less ideal case, we'll
>> keep shipping the F24-based one, but there's no reason that can't work
>> with the new Atomic-focused design. For that matter, we could launch
>> that _before_ the GA.
>
> So, I'm looking at this from a user perspective.
>
> * F25 is announced
> * User goes to getfedora.org, sees new "atomic" icon.
> * User clicks through
> * User sees that Atomic is still F24.
>
> From that point, one of two things happens:
>
> 1. User files a bug, and we're flooded with "atomic download page not
> updated" bugs, or
>
> 2. user decides that Atomic isn't a real thing and never goes back.
>
> I really don't see a flow that results in the user checking back two
> weeks later to see if Atomic has been updated yet.  Especially since
> we're dealing with a substantial issue with SELinux and it's not
> guaranteed that there will be an F25 atomic release 2 weeks later, either.
>
> You are the Project Leader, and you can certainly say "do it anyway".
> But please understand why I think it's not a great idea.

There's roughly 5 weeks to GA to get atomic stuff sorted out, which
sounds like there's some padding available.

Option A: Burn the midnight oil and commit to the Atomic landing page
and its deliverables.

Option B:  Ask design folks if they're willing and able to be prepared
with contingency: swap out the planned new Atomic landing page for an
updated version of the current Cloud landing page, if Atomic isn't
ready by X days before GA.

IF you have to pull the contingency, I think at most anytime even
within Fedora 25's life time, you can swap out Cloud for Atomic to
underscore the new emphasis.

I think it's right to say instead of pressure cooker May and November,
that it's a lighter effort more broadly distributed. I don't see a big
problem with a change in branding midstream for a release, but I'm not
a marketing type.



-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


Re: Cloud and Server Q

2016-10-04 Thread Chris Murphy
On Tue, Oct 4, 2016 at 11:59 AM, Josh Berkus <jber...@redhat.com> wrote:
> On 10/04/2016 08:13 AM, Colin Walters wrote:
>>
>> On Tue, Oct 4, 2016, at 09:46 AM, Paul W. Frields wrote:
>>> >
>>> > I think mattdm would agree we don't want to potentially,
>>> > *indefinitely* block a six-month release with a deliverable that can
>>> > be fixed and re-released in two weeks.
>> It's not that simple - this is a messy topic.  What I think this
>> is about isn't delaying or blocking - it's *prioritization*.  If
>> an issue comes up in Anaconda or systemd or whatever
>> that affects the "next AH", we need those teams to priortize
>> those fixes the same as they do for Workstation or Server.
>
> Yes, this is exactly the problem I'm raising.  We've had an issue with
> F25-base Atomic not booting for a couple weeks now, and until the last
> couple of days, nobody has been working on it.  It seems to be a simple
> fact of the Fedora release cycle that if something isn't
> release-blocking, it doesn't get done. This isn't new, it's an issue
> which has plagued Fedora Atomic for, as far as I can tell, its entire
> existence.

Perhaps in some cases, but it's not always true. Spins get done even
though they're not release blocking. The issue with release blocking
status is it compels the expert in the particular area of failure to
become involved. And that is a limited resource. Possibly a big part
of the reason for Atomic failures is there's a lack of documentation
across the board, both ostree stuff as well as releng's processes, and
then when ostree failures happen the logs are often lacking in such
detail that a Tarot card reader might have a better chance of guessing
what's going on than the logs indicate. This makes it difficult to get
contributors involved. And makes it damn near impossible any of them
would want to become even intermediately competent - it's a heavy
investment.



-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


Re: Current Fedora 25 Atomic images are failing to boot

2016-10-03 Thread Chris Murphy
On Mon, Oct 3, 2016 at 1:06 PM, Colin Walters <walt...@verbum.org> wrote:
> On Mon, Oct 3, 2016, at 02:57 PM, Dusty Mabe wrote:
>
>> There is a kernel panic happening early in boot. Here is the serial
>> console log from one of those boots:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1380866


Does this need to be marked as a freeze exception to back the change
out for beta candidate images to work?

-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


Re: Cloud and Server Q

2016-09-30 Thread Chris Murphy
On Fri, Sep 30, 2016 at 3:31 PM, Josh Berkus <jber...@redhat.com> wrote:
> On 09/30/2016 02:01 PM, Josh Boyer wrote:
>
>> 16:44:56  Cloud base image is the only blocking deliverable.
>> 16:44:59  Atomic is not.
>>
>> I realize this WG is in the middle of rebooting itself, but to have
>> clearly conflicting information from the WG members is a bit
>> concerning.
>
> Kushal?
>
> Based on my attendence at the Cloud WG meetings, I had the understanding
> that Atomic was becoming our main deliverable.  If that's not true, then
> I need to pull a whole bunch of changes and put them on ice until Fedora 26.

What also matters is the understanding of others who needed to
understand this. To me it sounds like a baton was dropped. But moving
forward...

What does release blocking mean? There are a bunch of QA criteria and
test cases that help make sure those criteria are met. There are no
atomic host specific criteria or test cases that I'm aware of. I
expect QA probably can't provide significant assistance in QAing the
atomic qcow2 image for this release. How big of a problem is that? Is
there a Fedora policy that requires a default download product to be
QA'd somehow, or to be release blocking? Can Cloud WG take the lead
QA'ing the atomic qcow2 image? What are the releng implications of it
not being release blocking?

For example, during the Fedora 24 cycle there was a neat bug in the
compose process that caused some images to fail. It wasn't possible to
just do another compose and cherry pick the working ISOs from two
different composes (I forget why). Is there anything like that here,
or is there sufficiently good isolation between ostree images and
other images? What happens if release is a go for everything else, but
atomic qcow2 is not working? What I've heard is "fix the problem and
remake the image" similar to the current two week cycle. Does releng
agree, and will there be time between a Thursday "go" and Tuesday
(whatever day it is) "release" to get an atomic qcow2 built and on
getfedora? What if there isn't? What if it's a week after release
before there's a working one?

If the liabilities there can be sorted out satisfactorily I'd say
proceed with Atomic on getfedora.

Next issue is Cloud Base images. Cloud WG needs to decide if these are
going to be created and if so how they're going to get linked to and
from where. Is there a designed landing page for these already? If
not, my thought is have a side bar link to a basic directory listing
for them, rather than the fancy landing page that currently exists for
Fedora 24 Cloud Base images. And demote the Cloud Base images to
non-release blocking. And then whatever contingency for that side bar
link if the Cloud Base images aren't available for release day.



-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


atomic dvd 25 default install crash

2016-09-22 Thread Chris Murphy
Still hitting this crash with a default installation of
Fedora-Atomic-dvd-x86_64-25-20160921.n.0.iso

https://bugzilla.redhat.com/show_bug.cgi?id=1375702

-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


Re: Fedora Atomic Host Two Week Release Announcement

2016-09-21 Thread Chris Murphy
This notification still seems broken. That page says:

The latest two week build did not meet our testing criteria. The
images available are from over 22 days ago. Check the Project Atomic
blog for updates and information about Atomic blocker bugs. And the
image available for download is
https://download.fedoraproject.org/pub/alt/atomic/stable/Atomic/x86_64/iso/Fedora-Atomic-dvd-x86_64-24-20160820.0.iso


On Wed, Sep 21, 2016 at 10:45 AM,  <nore...@fedoraproject.org> wrote:
>
> A new update of Fedora Cloud Atomic Host has been released and can be
> downloaded at:
>
> Images can be found here:
>
> https://getfedora.org/en/cloud/download/atomic.html
>
> Respective signed CHECKSUM files can be found here:
> https://alt.fedoraproject.org/pub/alt/atomic/stable/Fedora-Atomic-24-20160921.0/CloudImages/x86_64/images/Fedora-CloudImages-24-20160921.0-x86_64-CHECKSUM
> https://alt.fedoraproject.org/pub/alt/atomic/stable/Fedora-Atomic-24-20160921.0/Atomic/x86_64/iso/Fedora-Atomic-24-20160921.0-x86_64-CHECKSUM
>
> Thank you,
> Fedora Release Engineering
>
> ___
> cloud mailing list -- cloud@lists.fedoraproject.org
> To unsubscribe send an email to cloud-le...@lists.fedoraproject.org
>



-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


Re: overlayfs for AFTER Fedora 25

2016-09-20 Thread Chris Murphy
Just in case this poor horse isn't suitably beaten yet.

1. Create 4 qcow2 files per
qemu-img create -f qcow2 *.qcow2 120g

Each qcow2 starts out 194K (not preallocated).
q
2. Format each qcow2
mkfs.ext4 
mkfs.ext4 -i 4096 
mkfs.xfs 
mkfs.btrfs 

3. mount each fs (mainly to be fair since ext4 does lazy init) and
wait until the qcow2 stops growing.

5.5M -rw-r--r--. 1 qemu qemu 5.9M Sep 19 20:40 bios_btrfs.qcow2
2.1G -rw-r--r--. 1 root root 2.1G Sep 19 20:27 bios_ext4_default.qcow2
7.7G -rw-r--r--. 1 root root 7.7G Sep 19 20:33 bios_ext4_i4096.qcow2
 62M -rw-r--r--. 1 qemu qemu  62M Sep 19 20:40 bios_xfs.qcow2


Btrfs and XFS take seconds to completely initialize. Ext4 defaults
took 6 minutes, and with -i 4096 it took 8 minutes to complete lazy
init.


-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


Re: overlayfs for AFTER Fedora 25

2016-09-16 Thread Chris Murphy
On Fri, Sep 16, 2016 at 2:15 PM, Chris Murphy <li...@colorremedies.com> wrote:

> You'd need to
> run all this by them and see if there's a way to do a mkfs.ext4 -i
> 4096 for just Atomic Host installations, there's no point doing that
> for workstation installations. Or just use XFS.

Another possibility is an AH specific /etc/mke2fs.conf file on the
installation media only.


[defaults]
base_features =
sparse_super,large_file,filetype,resize_inode,dir_index,ext_attr
default_mntopts = acl,user_xattr
enable_periodic_fsck = 0
blocksize = 4096
inode_size = 256
inode_ratio = 16384


By changing inode_ratio = 4096, it achieves the same outcome as -i
4096 without having to pass that flag at mkfs time. And it'd only
affect installation time file systems (including /boot and / as well
as the persistent storage for overlayfs and containers). So... yeah.

FWIW, you're basically already using XFS with the dm-thin
docker-storage-setup you've got going on right now. It doesn't get
mounted anywhere, but

$ docker info
[chris@localhost ~]$ sudo docker info
[sudo] password for chris:
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 1.10.3
Storage Driver: devicemapper
 Pool Name: fedora--atomic-docker--pool
 Pool Blocksize: 524.3 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs


So, just use XFS across the board (plus overlayfs on the persistent
storage for containers).

As for Workstation changing file systems, that's another ball of wax.
I'd just say use XFS + overlayfs there too to keep it simple across
the various products in the near term. And then presumably the
Workstation folks will want Btrfs when it sufficiently stable that the
kernel team won't freak out if there's still no Btrfs specific kernel
dev on the team or at Red Hat.


-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


Re: overlayfs for AFTER Fedora 25

2016-09-16 Thread Chris Murphy
On Fri, Sep 16, 2016 at 10:56 AM, Chris Murphy <li...@colorremedies.com> wrote:

> Inode exhaustion?
>
> If the installer is going to create the file system used for overlayfs
> backing storage with ext4, that probably means mkfs.ext4 -i 4096 will
> need to be used; so how does that get propagated to only AH installs,
> for both automatic and custom partitioning? Or figure out a way to
> drop custom/manual partitioning from the UI. Or does using XFS
> mitigate this issue? A simple search turns up no inode exhaustion
> reports with XFS. The work around for ext4 is at mkfs time, it's not
> something that can be changed later.

I just did some more digging, and also chatted with Eric Sandeen about
this. Here's what I've learned:

- Inode exhaustion with mkfs.ext4 defaults can be a real thing with
overlayfs [1]
- mkfs.ext4 -i 4096 will make 1 inode per 4096 byte block, so 1:1,
which is a metric shittonne of inodes
- a different -i value might be more practical most of the time, but
if the maximum aren't created at mkfs time and are exhausted the fs
basically face plants and no more files can be created; and it's only
fixable by a.) deleting a bunch of files or b.) creating a new file
system to have more inodes preallocated.
- mkfs.ext4 hands off the actual creation of the inodes to lazy init
at first mount time, it's a lot of metadata being written to do this

- XFS doesn't have this issue, its inode allocation is dynamic (there
are limits but can be changed with xfs_growfs)
- XFS now defaults to -m crc=1, and by extension -n flags=1 which
overlayfs wants for putting filetype in the directory entry; Fedora 24
had a sufficiently new xfsprogs for this from the get go.

I don't know what the workflow is for creating the persistent storage
for the host, whether this will be Anaconda's role or something else?
If Anaconda, my experience has been the Anaconda team are reluctant to
use non-default mkfs unless there's a UI toggle for it. You'd need to
run all this by them and see if there's a way to do a mkfs.ext4 -i
4096 for just Atomic Host installations, there's no point doing that
for workstation installations. Or just use XFS.



[1]
https://github.com/coreos/bugs/issues/264
https://github.com/boot2docker/boot2docker/issues/992

-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


Re: overlayfs for AFTER Fedora 25

2016-09-16 Thread Chris Murphy
On Fri, Sep 16, 2016 at 7:02 AM, Colin Walters <walt...@verbum.org> wrote:
>
>
> On Thu, Sep 15, 2016, at 09:57 AM, Dusty Mabe wrote:
>
>> That is correct, but changing a default like that might be a bad idea.
>> My opinion is that it should happen on a major release boundary.
>
> One thing this impacts is the AH partitioning - it no longer makes
> sense by default with overlayfs.  I think we should probably do exactly
> the same thing as the Server SIG (and consider doing it for Workstation
> too), which actually argues for just fixing the Anaconda defaults.
>
> Server thread:
> https://lists.fedoraproject.org/archives/list/ser...@lists.fedoraproject.org/thread/D7ZK7SILYDYAATRFS6BFWZQWS6KSRGDG/

The genesis of that was me pointing to Cloud Atomic ISO's handling;
since Server went with pretty much the identical layout, it managed to
get slipped in for Alpha. It was a proven layout. [1]

For an Atomic Host overlayfs based layout, there's nothing within
Fedora that's a proven layout. For starters, it could be something
much simpler than what CoreOS is doing [2]. If the target installation
is VM, then dropping LVM stuff makes sense. If it's going to include
baremetal, keeping LVM makes sense. I'm a bit unclear on this point,
but with a handwave it sorta feels like Cloud->Container WG is far
less interested in the baremetal case, where Server is about as
interested in baremetal as VM and container cases. If that's true,
then the CoreOS layout is a decent starting point, and just needs some
simplification to account for ostree deployments rather than partition
priority flipping.

Inode exhaustion?

If the installer is going to create the file system used for overlayfs
backing storage with ext4, that probably means mkfs.ext4 -i 4096 will
need to be used; so how does that get propagated to only AH installs,
for both automatic and custom partitioning? Or figure out a way to
drop custom/manual partitioning from the UI. Or does using XFS
mitigate this issue? A simple search turns up no inode exhaustion
reports with XFS. The work around for ext4 is at mkfs time, it's not
something that can be changed later.

Release blocking and custom partitioning?

Upon AH image becoming release blocking, then "The installer must be
able to create and install to any workable partition layout using any
file system and/or container format combination offered in a default
installer configuration. " applies. Example bug [3] where this fails
right now. Does it make sense for AH installations to somehow be
exempt from custom partitioning resulting in successful installations?
And what would that look like criterion wise (just grant an
exception?) or installer wise (drop the custom UI or put up warnings
upon entering?)



[1]
https://lists.fedoraproject.org/archives/list/ser...@lists.fedoraproject.org/thread/PLWNOM6Z5226VZYUHTL6KMS3553VSQ3W/

[2]
https://coreos.com/os/docs/latest/sdk-disk-partitions.html
Trivial pursuit is this "the GPT priority attribute" which I can find
no where else, but I rather like this idea of using an xattr on a
directory as the hint for which fs tree the bootloader should use
rather than writing out new bootloader configurations.

[3]
https://bugzilla.redhat.com/show_bug.cgi?id=1289752




-- 
Chris Murphy
___
cloud mailing list -- cloud@lists.fedoraproject.org
To unsubscribe send an email to cloud-le...@lists.fedoraproject.org


Re: overlayfs for AFTER Fedora 25

2016-09-14 Thread Chris Murphy
On Wed, Sep 14, 2016 at 2:45 PM, Jason Brooks <jbro...@redhat.com> wrote:
> On Wed, Sep 14, 2016 at 12:14 PM, Dusty Mabe <du...@dustymabe.com> wrote:
>>
>> In the cloud meeting today I brought up overlayfs and F25. After
>> discussing with the engineers closer to the technology they recommend
>> waiting to move to overlayfs as the default in F26.
>>
>> I think this will work well because it will give us some time to allow
>> people to "try" overlayfs in F25 (we should provide good docs on this)
>> and then give us feedback before we go with it as default in F26. If
>> the feedback is bad then maybe we wouldn't even go with it in F26, but
>> hopefully that won't be the case.
>>
>> Thoughts?
>
> Sounds good to me.

I'm  uncertain if this is current or needs an update:

Evaluate overlayfs with docker
https://github.com/kubernetes/kubernetes/issues/15867

If the way forward is a non-duplicating cache then I see a major
advantage gone. But that alone isn't enough to promote something else,
I'd just say, hedge your bets. Pretty much all the reasons why CoreOS
switched from Btrfs to overlay have been fixed, although there's a
asstrometric ton of enospc rework landing in kernel 4.8 [1] that will
need time to shake out, and if anyone's able to break it, one of the
best ways of getting it fixed and avoiding regressions is to come up
with an xfstests [2] for it to be cleanly reproduced. The Facebook
devs consistently report finding hardware (even enterprise stuff that
they use) doing batshit things that Btrfs catches and corrects that
other filesystems aren't seeing. And then on the slow downs mainly due
to fragmentation when creating and destroying many snapshots over a
short period of time, this probably could be mitigated with garbage
collection optimization, and I've had some ideas about that if anyone
wants to futz around with it.

The more conservative change is probably XFS + overlayfs though, since
now XFS checksums fs metadata and the journal, which helps catch
problems before they get worse.


[1]
http://www.spinics.net/lists/linux-btrfs/msg53410.html

[2] semi random example
http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfstests.git;a=blob;f=tests/btrfs/060


-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: growpart not working in Fedora 25 cloud base

2016-09-11 Thread Chris Murphy
On Sat, Sep 10, 2016 at 10:45 PM, Dusty Mabe <du...@dustymabe.com> wrote:
>
>
> On 09/11/2016 12:22 AM, Dusty Mabe wrote:
>>
>>
>> On 09/10/2016 12:47 PM, Chris Murphy wrote:
>>> On Fri, Sep 9, 2016 at 9:28 PM, Dusty Mabe <du...@dustymabe.com> wrote:
>>>>
>>>>
>>>> Should I open a bug for this? Can we get someone to look at it/work on it?
>>>
>>> Yes, and I think it needs a dmesg in case partprobe was called but
>>> that failed for some reason. And then need to look at the cloud-init
>>> code and see if partprobe is being called. This is not the best log,
>>> it doesn't report the actual commands its using and the exit code for
>>> each command. So we're left wondering if partprobe was called or not.
>>> Maybe it's being called but is missing in the image?
>>
>> I opened a bug here and added some more information to it:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1374968
>>
>
> and.. this has actually already been reported and the fix is in
> updates-testing.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1371761

Good catch. Looks like it affects udisks2/storaged as well.


-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: growpart not working in Fedora 25 cloud base

2016-09-10 Thread Chris Murphy
On Sat, Sep 10, 2016 at 12:05 PM, Chris Murphy <li...@colorremedies.com> wrote:
> Could be related this bug, I see the same error there after Disks
> changes partitioning, and the old partition table is being used. A
> feature for Fedora 25 is udisks is replaced by storaged, so this could
> be part of that problem. But I have no idea why could-init would be
> using udisks or storaged, so this might be a goose chase.
>
> Error setting partition type after formatting
> https://bugzilla.redhat.com/show_bug.cgi?id=1374334

Could be both cloud-init and this storaged bug are hitting the same
lower level bug, causing the kernel to not get an updated partition
table (via partprobe)... the journal output in that bug isn't
enlightening, there are no kernel messages. One reason it'd fail is if
something has mounted any file system on the device that's having its
partition modified, that'd make it busy, and partprobe tends to fail
in that case.

-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: growpart not working in Fedora 25 cloud base

2016-09-10 Thread Chris Murphy
Could be related this bug, I see the same error there after Disks
changes partitioning, and the old partition table is being used. A
feature for Fedora 25 is udisks is replaced by storaged, so this could
be part of that problem. But I have no idea why could-init would be
using udisks or storaged, so this might be a goose chase.

Error setting partition type after formatting
https://bugzilla.redhat.com/show_bug.cgi?id=1374334



Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: growpart not working in Fedora 25 cloud base

2016-09-10 Thread Chris Murphy
On Fri, Sep 9, 2016 at 9:28 PM, Dusty Mabe <du...@dustymabe.com> wrote:

Stderr: "attempt to resize /dev/sda failed. sfdisk output below:\n|
Backup files:\n|
MBR (offset 0, size   512):
/tmp/growpart.PO8wWI/backup-sda-0x.bak\n| \n|
Disk /dev/sda: 10 GiB, 10737418240 bytes, 20971520 sectors\n|
Units: sectors of 1 * 512 = 512 bytes\n|
Sector size (logical/physical): 512 bytes / 512 bytes\n| I/O size
(minimum/optimal): 512 bytes / 512 bytes\n| Disklabel type: dos\n|
Disk identifier: 0x1ef30347\n| \n|
Old situation:\n| \n|
Device Boot Start End Sectors Size Id Type\n|
/dev/sda1  * 2048 6291455 6289408   3G 83 Linux\n|

Ok so  it starts out as 6289408/2048= 3071MiB (or 3G)

>Created a new partition 1 of type 'Linux' and of size 10 GiB.\n| /dev/sda2: \n|
New situation:\n| \n|
Device Boot Start  End  Sectors Size Id Type\n|
 /dev/sda1  * 2048 20971519 20969472  10G 83 Linux\n| \n|

New size. But what about sda2? It said it was creating a new partition
sda2, but not specifying its size, only specifying the new size of
sda1.

> The partition table has been altered.\n| Calling ioctl() to re-read partition 
> table.\n| Re-reading the partition table failed.: Device or resource busy\n| 
> The kernel still uses the old table. The new table will be used at the next 
> reboot or after you run partprobe(8) or kpartx(8).\n

Something isn't calling partprobe? Or there's a kernel error in
re-reading the device? dmesg would help, maybe.




* WARNING: Resize failed, attempting to revert **\n512+0
records in\n512+0 records out\n512 bytes copied, 0.000400551 s, 1.3
MB/s\n* Appears to have gone OK \n"

And if we are to believe this, it changed the partition table back to
the Old Situation.


> Sep 10 03:13:17 cloudhost.localdomain cloud-init[645]: [CLOUDINIT] 
> util.py[DEBUG]: resize_devices took 0.127 seconds
> Sep 10 03:13:17 cloudhost.localdomain cloud-init[645]: [CLOUDINIT] 
> cc_growpart.py[DEBUG]: '/' FAILED: failed to resize: disk=/dev/sda, ptnum=1: 
> Unexpected error while running command.
>Command: ['growpart', 
> '/dev/sda', '1']
>Exit code: 2
>Reason: -
>Stdout: 'FAILED: 
> failed to resize\n'
>Stderr: "attempt to 
> resize /dev/sda failed. sfdisk output below:\n| Backup files:\n|  MBR 
> (offset 0, size   512): /tmp/growpart.PO8wWI/backup-sda-0x.bak\n| 
> \n| Disk /dev/sda: 10 GiB, 10737418240 bytes, 20971520 sectors\n| Units: 
> sectors of 1 * 512 = 512 bytes\n| Sector size (logical/physical): 512 bytes / 
> 512 bytes\n| I/O size (minimum/optimal): 512 bytes / 512 bytes\n| Disklabel 
> type: dos\n| Disk identifier: 0x1ef30347\n| \n| Old situation:\n| \n| Device  
>Boot Start End Sectors Size Id Type\n| /dev/sda1  * 2048 6291455 
> 6289408   3G 83 Linux\n| \n| >>> Script header accepted.\n| >>> Script header 
> accepted.\n| >>> Script header accepted.\n| >>> Script header accepted.\n| 
> >>> Created a new DOS disklabel with disk identifier 0x1ef30347.\n| Created a 
> new partition 1 of type 'Linux' and of size 10 GiB.\n| /dev/sda2: \n| New 
> situation:\n| \n| Device Boot Start  End  Sectors Size Id Type\n| 
> /dev/sda1  * 2048 20971519 20969472  10G 83 Linux\n| \n| The partition 
> table has been altered.\n| Calling ioctl() to re-read partition table.\n| 
> Re-reading the partition table failed.: Device or resource busy\n| The kernel 
> still uses the old table. The new table will be used at the next reboot or 
> after you run partprobe(8) or kpartx(8).\n* WARNING: Resize failed, 
> attempting to revert **\n512+0 records in\n512+0 records out\n512 bytes 
> copied, 0.000400551 s, 1.3 MB/s\n* Appears to have gone OK \n"
>


And that just looks like a 2nd attempt which also fails.


>
>
> Should I open a bug for this? Can we get someone to look at it/work on it?

Yes, and I think it needs a dmesg in case partprobe was called but
that failed for some reason. And then need to look at the cloud-init
code and see if partprobe is being called. This is not the best log,
it doesn't report the actual commands its using and the exit code for
each command. So we're left wondering if partprobe was called or not.
Maybe it's being called but is missing in the image?



-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: Fedora 25

2016-09-04 Thread Chris Murphy
On Sun, Sep 4, 2016 at 6:54 AM, Benson Muite <benson_mu...@emailplus.org> wrote:
> Hi,
>
> If any of you use Fedora Atomic as a desktop, could you add a brief overview
> of why and how (workflow) you do so here:
> https://fedoraproject.org/wiki/Fedora_25_talking_points
>
> For the typical Fedora workstation user, what is needed to migrate to Fedora
> Atomic as a desktop? Does this make it easier to use remote cloud resources?

These might be better asked on desktop@ list where the Workstation WG
and users can put in their 2 cents?


-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: Proposal: for F26, move Cloud Base Image to Server WG

2016-08-25 Thread Chris Murphy
On Thu, Aug 25, 2016 at 3:38 PM, Stephen John Smoogen <smo...@gmail.com> wrote:
> On 25 August 2016 at 13:34, Matthew Miller <mat...@fedoraproject.org> wrote:
>>
>> We've talked about this for a while, but let's make it formal. The plan
>> is to transition from Cloud as a Fedora Edition to Something Container
>> Clustery (see https://fedoraproject.org/wiki/Objectives/ProjectFAO).
>>
>> But, we still need cloud as a _deploy target_. The FAO-container-thing
>> will continue to have cloud image deploy targets (as well as bare
>> metal). I think it makes sense to _also_ have Fedora Server as a cloud
>> deploy target.
>>
>
> Could we make sure that whatever targets we have are actually getting
> tested? The fact that autocloud has said it was broken for months but
> the cloud sig wasn't looking or fixing says that before we get to step
> 2, we need to say 'is anyone more than 2 people really interested?' It
> should be ok to say 'no we aren't.' without people diving into the
> fire trying to rescue something that unless it was on fire they
> wouldn't have helped.

There are a lot of images being produced and I have no idea if they're
really needed. That a release blocking image (cloud base qcow2) nearly
caused F25 alpha to slip because it was busted at least suggests it
probably shouldn't be release blocking anymore. FWIW, cloud base qcow2
now gets grub2 in lieu of extlinux as the work around for the
breakage.


-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


don't use docker 1.10.3-3 or -4

2016-04-25 Thread Chris Murphy
https://bugzilla.redhat.com/show_bug.cgi?id=1330294
https://bugzilla.redhat.com/show_bug.cgi?id=1322909

Beware of 1.10.3-4 in tree 24.16. And possibly -3 also although I
don't know what tree that's in. The only way I could fix it was dnf
remove docker then dnf reinstall docker; upgrading to -5 doesn't fix
the problem, the -4 version had to be removed first. So in particular
for atomic users this is not good because neither rollback nor updates
will fix the problem. And I don't know what the problem is, so I don'
t know how to fix it other than with the dnf remove hammer.



-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: [Marketing] Re: [MAGAZINE PROPOSAL] Fwd: [DRAFT] Why we're retiring 32-bit Images (was Re: Retiring 32-bit images)

2016-04-19 Thread Chris Murphy
On Tue, Apr 19, 2016 at 2:48 PM, Adam Williamson
<adamw...@fedoraproject.org> wrote:
> On Tue, 2016-04-19 at 13:48 -0600, Chris Murphy wrote:

>> Any i686 package that fails to build means it's failed for all primary
>> archs, because i686 is a primary arch. And a failed build means it
>> won't be tagged for compose so depending on the package it could hold
>> up composes.
>
> True, though I hadn't actually mentioned that scenario. But indeed. Say
> we needed a fix to dracut, pronto, to make the x86_64 cloud base image
> boot, but the build with the fix failed on i686: that would have to be
> dealt with somehow. Good point.

Oh and about terminology, it may be here where "block" gets reused as
a term in a confusing way. If dracut build fails on i686, that
"blocks" composes. But it's really a kind of claw back: zombie i686 is
grabbing the leg of other primary archs, and that stops the workflow.

Making i686 secondary would prevent this?


-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: [Marketing] Re: [MAGAZINE PROPOSAL] Fwd: [DRAFT] Why we're retiring 32-bit Images (was Re: Retiring 32-bit images)

2016-04-19 Thread Chris Murphy
On Tue, Apr 19, 2016 at 2:48 PM, Adam Williamson
<adamw...@fedoraproject.org> wrote:
> On Tue, 2016-04-19 at 13:48 -0600, Chris Murphy wrote:

>
>> From my limited perspective, such non-functional failure held up
>> release when it violated a release criterion in effect because that
>> non-functionality became coupled with image blocking, i.e. if kernel
>> doesn't function, then image doesn't function/is DOA, DOA images are a
>> release criteria violation, therefore block. Correct? Or is there some
>> terminology nuance here that I'm still missing in the sequence?
>
> No, even in this case there is no release blocking impact, because
> nothing release blocking is broken by the bug. The i686 images are not
> release blocking, end of story. Even if they are completely DOA, that
> does not block release.

Yes, I meant i686 in the past tense.

OK so I think I get it. i686 is officially primary, but in practice
it's at best secondary. And that should be made official. TBD whether
there's even enough people power and momentum to support it as
secondary.


>> It's best to assume I don't understand the terms well enough to use
>> them precisely, rather than assume I'm trying to redefine them.
>
> I was not actually thinking of you there (I just picked your post to
> reply to since it was at the top of the pile), more the vagueness in
> the thread in general.

Got it.


-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: [Marketing] Re: [MAGAZINE PROPOSAL] Fwd: [DRAFT] Why we're retiring 32-bit Images (was Re: Retiring 32-bit images)

2016-04-19 Thread Chris Murphy
On Tue, Apr 19, 2016 at 1:11 PM, Adam Williamson
<adamw...@fedoraproject.org> wrote:

>
> QA referred the question of whether upgrades from a release where i686
> was 'release blocking' (<24) to releases where i686 is 'non blocking'
> (>23) should be considered 'release blocking' to FESCo. i.e. if there
> are violations of the release criteria in this upgrade path, should we
> treat that as blocking the Beta or Final releases. FESCo's decision was
> "no".

So no matter what, all i686 images (qcow2, raw, ISOs) are non-blocking.

Any i686 package that fails to build means it's failed for all primary
archs, because i686 is a primary arch. And a failed build means it
won't be tagged for compose so depending on the package it could hold
up composes.

But the current i686 problems aren't package build failures, rather
it's a particular critical path package (or two) that are broadly or
entirely non-functional when executed. So what's it called when a
critical path package fails to function on a primary arch? And what's
done about it?

From my limited perspective, such non-functional failure held up
release when it violated a release criterion in effect because that
non-functionality became coupled with image blocking, i.e. if kernel
doesn't function, then image doesn't function/is DOA, DOA images are a
release criteria violation, therefore block. Correct? Or is there some
terminology nuance here that I'm still missing in the sequence?


> I really think it would help if we use these terms carefully and
> precisely, and if we're going to re-define them in any way, make that
> clear and explicit.

It's best to assume I don't understand the terms well enough to use
them precisely, rather than assume I'm trying to redefine them.


-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: [Marketing] Re: [MAGAZINE PROPOSAL] Fwd: [DRAFT] Why we're retiring 32-bit Images (was Re: Retiring 32-bit images)

2016-04-18 Thread Chris Murphy
On Mon, Apr 18, 2016 at 5:31 PM, Dennis Gilmore <den...@ausil.us> wrote:
> On Monday, April 18, 2016 2:59:18 PM CDT you wrote:
>> On 04/15/2016 05:28 PM, Joe Brockmeier wrote:
>> > On 04/15/2016 10:38 AM, Dennis Gilmore wrote:
>> >> I would like us to demote them to secondary.
>> >
>> > Why? We've already decided to drop. I'm not opposed, just curious why.
>> > IIRC we were hitting a major problem with kernel compat as well?
>>
>> Pinging on this - I thought we'd reached a decision and wanted to
>> publicize that sooner than later.
>>
>> If there's a reason to prefer move to secondary, let's discuss.
>>
>> Best,
>>
>> jzb
>
> I prefer to move it to secondary because people could be  relying on it still,
> it gives us a way to move forward and not be blocked on 32 bit x86. If it does
> not work then it will not get shipped. Just dropping them on the floor does
> not give as smooth a transition, nor does it give people that want it still
> the chance to pick it up and continue to carry it forward.


Is the context Cloud, or in general? I think going from primary for
all products to totally dropping it is a problem, even if install
media is non-blocking. I have no stake in i686 at all, and I think
Cloud and Server are less affected by totally dropping i686 than
Workstation; but I think quitting i686 cold turkey needs
reconsideration.

Anyway I think no one has done anything wrong here, but the warnings
of the kernel team were maybe considered something like, "oh, we'll
get by one more release or two by the skin of our teeth before it
blows up" and yet it just turned out that it's blowing up already.

If the idea is we should block on i686 in general for upgrading, I'd
agree, even though it's a pain.

For Cloud, maybe the way forward at worst is to support Cloud Atomic.
And the images are i686 only? Of course that assumes any problems with
binutil and kernel, or whatever else comes up, is sanely fixable with
a best effort.

?

-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: Cloud Atomic ISO autopartitioning

2016-03-11 Thread Chris Murphy
On Fri, Mar 11, 2016 at 10:14 AM, Dennis Gilmore <den...@ausil.us> wrote:
> On Friday, March 11, 2016 9:58:54 AM CST Chris Murphy wrote:
>> Hi,
>>
>> The installer autopart in Cloud Atomic ISO leaves a bunch of free
>> space in the VG, which on first boot is turned into a dm thin pool by
>> docker-storage-setup. This is quite cool, so I'm suggesting it for
>> Server (minus the auto configuration part), but I can't tell where the
>> code is that alters the installer's autopartitioning behavior. The
>> kickstart file says it's using autopart, it doesn't have a breakdown
>> of what it's asking the installer to do, so I guess by virtue of it
>> being a Cloud productized installer it knows to do this.
>>
>> Suggestions?
>
>
> The code that overrides anaconda's defaults lives in fedora-productimg-atomic

Thanks!

-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Cloud Atomic ISO autopartitioning

2016-03-11 Thread Chris Murphy
Hi,

The installer autopart in Cloud Atomic ISO leaves a bunch of free
space in the VG, which on first boot is turned into a dm thin pool by
docker-storage-setup. This is quite cool, so I'm suggesting it for
Server (minus the auto configuration part), but I can't tell where the
code is that alters the installer's autopartitioning behavior. The
kickstart file says it's using autopart, it doesn't have a breakdown
of what it's asking the installer to do, so I guess by virtue of it
being a Cloud productized installer it knows to do this.

Suggestions?

-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: [atomic-devel] Fedora Atomic Host Two Week Release Announcement

2016-01-29 Thread Chris Murphy
On Fri, Jan 29, 2016 at 7:29 AM, Micah Abbott <miabb...@redhat.com> wrote:
> AFAICT, the link for the updated ISO from
>
> https://getfedora.org/en/cloud/download/atomic.html
>
> ...is working properly this morning.
>
>
> In case one of the mirrors isn't up to speed yet, the direct link is:
>
> https://download.fedoraproject.org/pub/alt/atomic/stable/Cloud_Atomic/x86_64/iso/Fedora-Cloud_Atomic-x86_64-23-20160127.2.iso
>

OK it's working after deleting the browser cache. Sometimes I don't
know why things work the way they work. Before clearing the cache, the
browser was consistently requesting the old wrong name from mirrors
even though Fedora's servers were supplying the correct new filename.
It's like a 20+ year old bug that makes "clear your browser cache"
sound sane.

-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: Fedora Atomic Host Two Week Release Announcement

2016-01-28 Thread Chris Murphy
On Wed, Jan 27, 2016 at 8:38 PM,  <nore...@fedoraproject.org> wrote:
>
> A new update of Fedora Cloud Atomic Host has been released and can be
> downloaded at:
>
> Images can be found here:
>
> https://getfedora.org/en/cloud/download/atomic.html


Clicking on the 64-bit Atomic ISO image results in:


http://mirrors.rit.edu/fedora/alt/atomic/stable/Cloud_Atomic/x86_64/iso/Fedora-Cloud_Atomic-x86_64-23-20160127.iso

404 - Not Found

http://mirrors.kernel.org/fedora-alt/atomic/stable/Cloud_Atomic/x86_64/iso/Fedora-Cloud_Atomic-x86_64-23-20160127.iso
Sorry, we cannot find your kernels  ##which btw is awesome


http://dl.fedoraproject.org/pub/alt/atomic/stable/Cloud_Atomic/x86_64/iso/Fedora-Cloud_Atomic-x86_64-23-20160127.iso
Not Found
The requested URL
/pub/alt/atomic/stable/Cloud_Atomic/x86_64/iso/Fedora-Cloud_Atomic-x86_64-23-20160127.iso
was not found on this server.




-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: Fedora Atomic Host Two Week Release Announcement

2016-01-28 Thread Chris Murphy
On Thu, Jan 28, 2016 at 1:19 PM, Matthew Miller
<mat...@fedoraproject.org> wrote:
> On Thu, Jan 28, 2016 at 12:01:34PM -0700, Chris Murphy wrote:
>> http://mirrors.rit.edu/fedora/alt/atomic/stable/Cloud_Atomic/x86_64/iso/Fedora-Cloud_Atomic-x86_64-23-20160127.iso
>
> Should be fixed now -- there was a missing ".2" in the filename. Reload
> the download page?

I'm redirected to a different mirror on each attempt, those mirrors
still aren't working yet, I bet it'll take a couple hours for them to
catch the update.

-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: Fedora Atomic Host Two Week Release Announcement

2016-01-28 Thread Chris Murphy
On Thu, Jan 28, 2016 at 5:25 PM, Matthew Miller
<mat...@fedoraproject.org> wrote:
> On Thu, Jan 28, 2016 at 03:46:05PM -0700, Chris Murphy wrote:
>> >> http://mirrors.rit.edu/fedora/alt/atomic/stable/Cloud_Atomic/x86_64/iso/Fedora-Cloud_Atomic-x86_64-23-20160127.iso
>> > Should be fixed now -- there was a missing ".2" in the filename. Reload
>> > the download page?
>> I'm redirected to a different mirror on each attempt, those mirrors
>> still aren't working yet, I bet it'll take a couple hours for them to
>> catch the update.
>
> Sorry, I wasn't clear -- the _mirrors_ are right, but the _link_ was
> bad. Should be
> https://download.fedoraproject.org/pub/alt/atomic/stable/Cloud-Images/x86_64/Images/Fedora-Cloud-Atomic-23-20160127.2.x86_64.qcow2

Except not qcow2 since this is for the 64-bit Atomic ISO link.

When I click that link, I'm taken to a different mirror each time. It
still is failing. Just clicked it now and I'm redirected to:

http://mirrors.kernel.org/fedora-alt/atomic/stable/Cloud_Atomic/x86_64/iso/Fedora-Cloud_Atomic-x86_64-23-20160127.iso

And it says my kernels can't be found (giggle). So that mirror at
least, isn't correct yet.

-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: rpm-ostree, failed upgrade, failed rollback

2015-12-28 Thread Chris Murphy
This is calling grub2-mkconfig at line 340
https://git.gnome.org/browse/ostree/tree/src/libostree/ostree-bootloader-grub2.c

And line 153 says this must have been called from a wrapper script.
I'm pretty much thinking grub2-mkconfig is not meant to be directly
called by the user either, and is envisioned to only get called by
e.g. ostree admin deploy/switch, or rpm-ostree rollback/upgrade, etc.
That's fine, it's just not obvious what user space tools belong to the
user.


Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: rpm-ostree, failed upgrade, failed rollback

2015-12-23 Thread Chris Murphy
On Wed, Dec 23, 2015 at 2:37 PM, Jonathan Lebon <jle...@redhat.com> wrote:
> - Original Message -
>> OK part of my confusion is that 'grub2-mkconfig' does not work when
>> called directly. Doing that results in a malformed grub.cfg.
>>
>> What is the correct way, on atomic builds, to recreate the grub.cfg?
>> How does rpm-ostree do this?
>
> Yeah, I've been confused by that as well. I haven't bothered
> investigating more, but it seems like running grub2-mkconfig
> on a fresh boot works, whereas calling it after some
> rpm-ostree operation such as upgrade/rebase will cause no
> output from 15_ostree.

For me, now on 23.39, even after fresh boot, and no matter where I
direct -o to write the file, 15_ostree is empty. It's seems like it's
not meant to be directly called. I've never had that command from user
space produce a correct grub.cfg. But with one exception, within
rpm-ostree updates or rollbacks, it produces correct grub.cfgs. So I
think something else is calling that script, and also telling it where
to put the grub.cfg (which goes in different locations depending on
the firmware, because rabbits).

I've variably gotten no menu entry grub.cfgs, and ones with
linux16/initrd16 instead of linuxefi/initrdefi and I can't tell why
this flips around other than there's some kind of state change that
doesn't happen when it's correctly called.



>
>> The closest I get to a command is 'ostree admin instutil
>> grub2-generate' but this fails with
>> **
>> ERROR:src/libostree/ostree-bootloader-grub2.c:154:_ostree_bootloader_grub2_generate_config:
>> assertion failed: (grub2_boot_device_id != NULL)
>> Aborted (core dumped)
>>
>> I'm not sure what it wants.
>
> It's only meant to be called by the /etc/grub.d/15_ostree script,
> which sets up some env vars for it. That said, it should probably
> error out more gracefully.

Well that sorta answers this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1293986

But on a normal system /etc/default/grub is consumed by
grub2-mkconfig, but that's not true on atomic. I've made changes to
that file and yet those changes aren't rolled into the grub.cfg. So
OK, there's 'ostree admin instutil set-karg' but I run into this
problem
https://bugzilla.redhat.com/show_bug.cgi?id=1293987

So now I have no idea whether 'ostree admin instutil' is user domain
or just meant as helpers for some other scripts. So I think we need to
know what the deprecated and new knobs are.

-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: atomic host grub2 version

2015-12-22 Thread Chris Murphy
On Tue, Dec 22, 2015 at 1:31 PM, Chris Murphy <li...@colorremedies.com> wrote:
> On two UEFI systems, one with F23 Workstation, the other with F23
> Cloud Atomic, I'm finding the grubx64.efi do not have the same hash,
> even though rpm -q reports the same rpm installed on both. This is
> unexpected.

I've found the sha256sum for /boot/efi/EFI/fedora/grubx64.efi on a
system with atomic tree version 23.38 matches that of the grux64.efi
in grub2-efi-2.02-0.23.fc23.x86_64.rpm, despite the fact rpm -q
reports grub2-efi-2.02-0.25.fc23.x86_64 is installed. So the grub2-efi
package is disconnected with the actual efi binary installed.

UEFI bootloader is not updated by rpm-ostree, even though rpm package
version suggests otherwise
https://bugzilla.redhat.com/show_bug.cgi?id=1293725


-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


atomic host grub2 version

2015-12-22 Thread Chris Murphy
On two UEFI systems, one with F23 Workstation, the other with F23
Cloud Atomic, I'm finding the grubx64.efi do not have the same hash,
even though rpm -q reports the same rpm installed on both. This is
unexpected.

Does the atomic tree include /boot/efi/EFI/fedora? And if not, is that
on the future feature list?

CVE-2015-8370 is what made me look at this. On BIOS computers, whether
conventional or atomic, GRUB2 user space tools are updated with
grub2-2.02-0.25.fc23, but that only updates user space tools. The user
has to manually run grub2-install to actually fix the problem. On UEFI
conventional installations, grubx64.efi is replaced automatically when
the RPM is updated; but apparently not on UEFI atomic installations.
Using grub2-install fails because grub2-efi-modules isn't installed by
default, and even if it were the resulting grubx64.efi is now no
longer signed by Fedora so it'll fail UEFI Secure Boot code signing
checks.



-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: rpm-ostree, failed upgrade, failed rollback

2015-12-17 Thread Chris Murphy
OK really weird.

Tree 29.34 is deployed, and I just ran:

# rpm-ostree rollback

This writes out a correct grub.cfg (uses linuxefi/initrdefi) and it
wrote the grub.cfg in the correct location
(/boot/efi/EFI/fedora/grub.cfg). And this grub.cfg works regardless of
which menu entry I pick in GRUB.

So the bug affected rpm-ostree upgrade, and possibly only the version
of that command in the 23.29 tree.
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: rpm-ostree, failed upgrade, failed rollback

2015-12-16 Thread Chris Murphy
Successfully booted using 'configfile' and editing the grub.cfg to use
linuxefi/initrdefi instead of linux16 and initrd16...

# bash -x grub2-mkconfig

http://fpaste.org/301941/30720114/

That's a bug. I just don't know whose bug it is. This is definitely a
UEFI system, the CSM is not used (efibootmgr works, and Secure Boot is
enabled). So something's got grub2-mkconfig awfully confused about
what kind of firmware this system has. And then it also tells grub the
-o destination path incorrectly.


Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: rpm-ostree, failed upgrade, failed rollback

2015-12-16 Thread Chris Murphy
At the grub menu, I hit, c to get to a grub shell, and then use
'configfile' command pointed to /boot/loader.0/grub.cfg - it reads
that configuration file instead of the one on
/boot/efi/EFI/fedora/grub.cfg, and both 23.29 and 23.34 tree menu
entries appear.

Both contain an error, however. They both use 'linux16' and 'initrd16'
instead of 'linuxefi' and 'initrdefi'. Something is very confused
about whether this is a BIOS or UEFI system.

If I change those commands to linuxefi and initrdefi, I can boot either tree.

There is something... maybe. My fstab looks like this:

UUID=908cb4df-410b-47e4-afb1-872255bd1244 /boot   ext4
   defaults1 2
UUID=5956-63D8  /boot/efi   vfat
umask=0077,shortname=winnt,x-systemd.automount,noauto 0 2
UUID=8b0c4840-4fc7-4782-a4c0-25fec8a40dd4 /btrfsdefaults 0 0

Normally grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg run manually
causes /boot/efi to be mounted automatically in a split second. So, is
rpm-ostree looking to see first if /boot/efi exists for any reason?
What determines whether grub2-mkconfig -o is directed to
/boot/efi/EFI/fedora, vs /boot/grub2? Thing is, there is no
/boot/grub2/grub.cfg at all... neither of the correct locations got a
grub.cfg. The correct grub.cfg (minus the wrong linux command) is in
/boot/loader.0.

Wonky.


Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: rpm-ostree, failed upgrade, failed rollback

2015-12-16 Thread Chris Murphy
rpm-ostree entry .conf

http://fpaste.org/301944/45030746/

What translates this file's linux/initrd into either linux16/initrd16
vs linuxefi/initrdefi?

-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: Cloud Atomic ISO, missing items for baremetal usage

2015-12-15 Thread Chris Murphy
On Tue, Dec 15, 2015 at 5:03 PM, Matthew Miller
<mat...@fedoraproject.org> wrote:
> On Wed, Dec 16, 2015 at 10:52:43AM +1100, Philip Rhoades wrote:
>> >A. Separate hardware/virt trees; have the installer ISO point at the
>> >   hardware one by default (but also have the option of virt)
>> >B. Finishing Atomic overlay support; making hardware enablement an
>> >overlay
>> >C. Getting all this stuff to work properly in SPCs
>> >D. Something else?
>> >
>> >* so is software. *sigh*
>> Does this mean there would be different hardware trees on the iso or
>> that a basic iso would be pulling the appropriate tree via the
>> network?
>
> Well, for "A", I was thinking one for hardware, one for virt/cloud —
> not going down the path of different trees for different types of
> hardware, because that's definitely the road to madness.
>
> For "B" (which is only theoretical, and as someone mentioned, may
> require cloning Colin), there could be different overlays depending on
> needs.
>
>> What are SPCs?
>
> Super-privileged containers. Basically, containers that are meant to
> manage the host OS. See
> https://www.youtube.com/watch?v=eJIeGnHtIYg from DevConf.cz last year.


Between ostree, spcs, fs options, and overlays, I think this is a lot
to chew off. And a lot of change in a short amount of time. That goes
for people doing the work, those who will have to document the
differences compared to conventional installs+setup+management, and
the users who will have to learn this.

A persistent spc to login to, to manage the host, is problematic for a
significant minority of use cases where the storage hardware changes
and now the container (currently) isn't aware of this for some tools.
So I think that needs more investigation and fixes so that we're not
having to document exceptions.

It seems to me the easier thing to do is tolerate baking more stuff
into the images and ISO. Growing that list now, and shrinking it later
is a better understood process, can be done faster, and requires fewer
resources. And by later, I mean once spcs and overlay stuff are a.
more mature b. better understood c. people doing that work have time
to do it.

The hardware specific utils could go in a metapackage that's enabled
for installation by default only on the ISO, and not for images.

-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: Fedora cloud image feedback

2015-12-15 Thread Chris Murphy
On Tue, Dec 15, 2015 at 9:33 PM, Deepak Shetty <dpkshe...@gmail.com> wrote:

> Also since cloud image is the most downloaded one, how about
> providing a .iso file with pre-loaded user/passwd, so that people
> willing to use cloud-image in non-cloud env (local, virt-mgr etc) can use
> the iso file as the cloud-init data source ?

There is an ISO that can be used to install Fedora Cloud using
Anaconda. I suggest using automatic partitioning (avoids problems due
to some missing pieces to support custom layouts).

https://getfedora.org/en/cloud/download/
In the center of the screen, click on Atomic Images, on the right side
is the ISO option. This is an atomic host system, no dnf (except in
containers based on images that use it).

Another option is to download Workstation or Server ISO *netinstall*
and click on Software Selection (in the hub of the installer), and
"Fedora Cloud Server" is an option in there. This is a conventionally
updated (with dnf) system, it's not an atomic host system.


-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: Cloud Atomic ISO, missing items for baremetal usage

2015-12-15 Thread Chris Murphy
On Tue, Dec 15, 2015, 5:49 PM Chris Murphy <li...@colorremedies.com> wrote:



It seems to me the easier thing to do is tolerate baking more stuff
into the images and ISO.

OK I just rewound this in my head, and said WTS out loud. There's one cloud
atomic tree, right? So adding a bunch of hardware stuff affects that whole
tree, and everything that uses it.

OK instead, more clarity on the downloads page the limitations of atomic
ISO on baremetal, and offer instead Server ISO or netinstall media
(non-atomic install) and picking the Cloud Server option in the installer
for a more complete and flexible install on hardware.

I do still wonder about decoupling kernel from the tree. Kernel regressions
happen.

Chris Murphy


rowing that list now, and shrinking it later
is a better understood process, can be done faster, and requires fewer
resources. And by later, I mean once spcs and overlay stuff are a.
more mature b. better understood c. people doing that work have time
to do it.

The hardware specific utils could go in a metapackage that's enabled
for installation by default only on the ISO, and not for images.

--
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: Cloud Atomic ISO, missing items for baremetal usage

2015-12-14 Thread Chris Murphy
On Sun, Dec 13, 2015 at 8:29 PM, Chris Murphy <li...@colorremedies.com> wrote:

> It's also best practices to disable the write cache on all drives used
> in any kind of RAID.

More important, all drives in RAID need SCT ERC set on each drive,
which is also not persistent on non-enterprise drives. That requires
smartctl -l scterc 70,70  otherwise read failures don't always
get fixed correctly, fester, and can needlessly result in the RAID
degrading or failing.

And now I see mdadm is not installed either on the ISO.

I filed this bug to get dosfs-tools included, mainly for UEFI systems.
https://bugzilla.redhat.com/show_bug.cgi?id=1290575

Should I just file bugs like that for each one of these other missing
components, and set them to block a tracker bug for things to include
on the ISO? In my view a baremetal installation is first a server, so
it should have basic server tools.


-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: Cloud Atomic ISO, missing items for baremetal usage

2015-12-13 Thread Chris Murphy
OK at this moment I'm thinking hdparm and smartmontools just need to
go on the ISO, along with iotop.

While both hdparm and smartmontools appear to work OK in a container
with --privileges=true, any hardware changes are not reflected in that
container in a way these two programs can see.

[root@3d2386bbd250 /]# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda  8:00 931.5G  0 disk
|-sda1   8:10   200M  0 part
|-sda2   8:20   500G  0 part /etc/hosts
|-sda3   8:30   500M  0 part
|-sda4   8:40 426.5G  0 part
`-sda5   8:50   4.3G  0 part [SWAP]

**plug in some drives***

[root@3d2386bbd250 /]# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda  8:00 931.5G  0 disk
|-sda1   8:10   200M  0 part
|-sda2   8:20   500G  0 part /etc/hosts
|-sda3   8:30   500M  0 part
|-sda4   8:40 426.5G  0 part
`-sda5   8:50   4.3G  0 part [SWAP]
sdb  8:16   0 698.7G  0 disk
sdc  8:32   0 465.8G  0 disk
sdd  8:48   0 698.7G  0 disk
sde  8:64   0 465.8G  0 disk

[root@3d2386bbd250 /]# hdparm -I /dev/sdb
/dev/sdb: No such file or directory
[root@3d2386bbd250 /]# hdparm -I /dev/sdc
/dev/sdc: No such file or directory
[root@3d2386bbd250 /]# hdparm -I /dev/sdd
/dev/sdd: No such file or directory
[root@3d2386bbd250 /]# hdparm -I /dev/sde
/dev/sde: No such file or directory
[root@3d2386bbd250 /]# smartctl -a /dev/sde
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.2.6-301.fc23.x86_64] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

Smartctl open device: /dev/sde [SAT] failed: No such device

Maybe lsblk, by virtue of libblkid, gets some state update for free, I
don't know. Clearly that's not the case for hdparm and smartctl, and
therefore I have to restart the container or start a new one for the
change to be visible to these tools. If I replace or add drives, will
I need to restart the container running smartd? If yes, that'd kinda
be a regression. Maybe I'm doing something wrong, but at the moment
I'm not groking the advantage of running these tools in a container.


Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: Cloud Atomic ISO, missing items for baremetal usage

2015-12-13 Thread Chris Murphy
On Sun, Dec 13, 2015 at 7:07 PM, Joe Brockmeier <j...@redhat.com> wrote:
> On 12/13/2015 07:11 PM, Chris Murphy wrote:
>> OK at this moment I'm thinking hdparm and smartmontools just need to
>> go on the ISO, along with iotop.
>
> What's the usage scenario you're picturing here?  This feels to me like a
> "pet" usage scenario where you're caring a whole lot about a single
> server install.

Any server with any number of drives.

Best practices is to have smartd monitor drive heath and report
failures by email or text, rather than via a service disruption or
irate human. While smartd could be running in a container, if the
container doesn't get state updates when drives are swapped or added,
then that requires a workaround: periodically restarting that
container. So what's the advantage of running this utility in a
container?

It's also best practices to disable the write cache on all drives used
in any kind of RAID. That's not a persistent setting, so it has to
happen every boot. Instead of a boot script or service that does this,
a container needs to startup shortly after each boot and do this.
What's the benefit of that workflow change? I don't understand that.
Another use of hdparm is ATA secure erase before dismissal of drives.

If the container not being fully aware of state changes is a bug, then
that's fine. In that case a super user highly privilegd container
running persistently with sshd running can then be used to do all
these things. But I still don't know what the advantage is, having to
remote into that container for some tasks, and into the host itself
for other tasks. Don't you think there should be some considerable
advantage, commensurate with the workflow change caused by relocating
simple tools commonly available on servers, to running only in
containers instead? I do.


-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: Cloud Atomic ISO, missing items for baremetal usage

2015-12-12 Thread Chris Murphy
On Fri, Dec 11, 2015 at 12:33 PM, Joe Brockmeier <j...@redhat.com> wrote:
> On 12/11/2015 02:23 PM, Chris Murphy wrote:
>>
>> These I have running in a fedora container. lspci mostly works, but
>> getting full -vvnn detail requires --privileged=true. And the other
>> three require it. iotop additionally needs --net=host. I'd be OK with
>> them just being available in a container, but it might make more sense
>> to just include them in the atomic ISO installation, maybe even
>> borrowing a list from the Server product?
>
> We want, as much as possible, to keep the image small and run all the
> things in containers where possible.
>
> If there's something where that just won't work, or is ludicrously
> difficult, we should discuss including it.

I think these may be needed in the ISO:

cryptsetup - needed to boot encrypted devices
rng-tools - this includes rngd, seems useful for all containers esp in
a cloud context. Even with --privileged=true I get:

# systemctl start rngd
Failed to get D-Bus connection: Operation not permitted
# systemctl status rngd
Failed to get D-Bus connection: Operation not permitted

Also, a way to separate kernels from the rest of the current tree.
Right now I'm on atomic 23.29, the previous tree I have installed is
way back to 23 (because it's an ISO installation), but I'm
encountering a kernel regression. It's very suboptimal to have to
rollback everything to 23, rather than just the kernel. Stepping the
kernel forward independently from the cloud atomic host tree is maybe
even better in some instances than rolling back.


-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


PTY allocation request failed on channel 0

2015-12-12 Thread Chris Murphy
Has anyone else run into this? I've never before run into this
problem, not any version of Server or Workstation.  Just cloud, and
I've run into it several times in just a few days.

I'm using public-key authentication to ssh into a Fedora 23 Cloud
Atomic ISO installation upgraded to 23.29. This works most of the
time, until it doesn't and then I always get this.

[chris@f23m ~]$ ssh chris@10.0.0.15
PTY allocation request failed on channel 0

If I happen to have an existing login available, even if I restart
sshd the problem isn't fixed. The only fix I've found so far is a
reboot which is more than a bit disruptive. Any ideas?

In the journal host side, it records a bunch of audit stuff, but three
lines that seem particularly relevant yet not illuminating:

Dec 12 12:37:10 f23a.localdomain audit[1]: USER_AVC pid=1 uid=0
auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0
msg='Unknown permission start for class system
exe="/usr/lib/systemd/systemd" sauid=0 hostname=? addr=? terminal=?'
[snip]
Dec 12 12:27:25 f23a.localdomain sshd[2029]: error: openpty: No such
file or directory
Dec 12 12:27:25 f23a.localdomain sshd[2032]: error: session_pty_req:
session 0 alloc failed

systemd-222-8.fc23.x86_64

I'd file a bug but I don't even know what to file it against. The full
journal output is here:
http://fpaste.org/3001



-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: PTY allocation request failed on channel 0

2015-12-12 Thread Chris Murphy
I have a lead.

I'm still working on this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1290691


And when I do this:
[root@f23a ~]# docker run --net=host --pid=host -v /dev:/dev
--privileged=true fedext /usr/sbin/iotop -d3
Traceback (most recent call last):
  File "/usr/sbin/iotop", line 17, in 
main()
  File "/usr/lib/python2.7/site-packages/iotop/ui.py", line 631, in main
main_loop()
  File "/usr/lib/python2.7/site-packages/iotop/ui.py", line 621, in 
main_loop = lambda: run_iotop(options)
  File "/usr/lib/python2.7/site-packages/iotop/ui.py", line 508, in run_iotop
return curses.wrapper(run_iotop_window, options)
  File "/usr/lib64/python2.7/curses/wrapper.py", line 22, in wrapper
stdscr = curses.initscr()
  File "/usr/lib64/python2.7/curses/__init__.py", line 33, in initscr
fd=_sys.__stdout__.fileno())
_curses.error: setupterm: could not find terminal


After that traceback, any attempt to ssh to the host is munged as
previously described. So somehow that docker command puts the host in
a state where subsequent ssh attempts fail.

Stopping docker and sshd, then starting sshd then docker, doesn't
help. I still can't login.

So is this a docker bug?


Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: PTY allocation request failed on channel 0

2015-12-12 Thread Chris Murphy
On Sat, Dec 12, 2015 at 1:31 PM, Chris Murphy <li...@colorremedies.com> wrote:
> I have a lead.
>
> I'm still working on this bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=1290691
>
>
> And when I do this:
> [root@f23a ~]# docker run --net=host --pid=host -v /dev:/dev

OK perfect. It's user error. The -v /dev:/dev wasn't meant to be
literal, but rather /dev/sda:/dev/sda, or whatever. So if I stop doing
that nonsense, the login breakage no longer happens.



-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: root on btrfs and lvmthinp for docker backing confusion

2015-12-11 Thread Chris Murphy
On Thu, Dec 10, 2015 at 9:18 PM, Dusty Mabe <du...@dustymabe.com> wrote:
>
>
> On 12/10/2015 08:35 PM, Chris Murphy wrote:

>> Followup question: Does Docker directly use a thin pool without
>> creating (virtual size) logical volumes? Because I don't see any other
>> LV's created, and no XFS filesystem appears on the host using the
>> mount command. And yet I see XFS mount and umount kernel messages on
>> the host. This is sort of an esoteric question. However, I have no
>> access to container files from the host like I can see inside each
>> btrfs subvolume when btrfs is the backing method. And that suggests
>> possibly rather different backup strategies depending on the backing.
>
> I believe it chops it up using low level device mapper stuff. I think
> you don't see the mounts on your host because they are in a different
> mount namespace (part of the magic behind containers).
>
> For more info on docker + device mapper look at slides 37-44 of [1]
>
> [1] - http://www.slideshare.net/Docker/docker-storage-drivers

I read all the slides. That is really helpful, there's quite a bit of
detail considering they're slides.

It's definitely more devicemapper than LVM based (makes sense, the
driver is "devicemapper" after all). The most that appears in LVM's
view is the thin pool, and once Docker owns it, LVM can't make virtual
LV's from that pool. As to the obscurity, on the one hand it's a
perception because while I'm quite comfortable with LVM tools, I'm not
that comfortable with dmsetup; and on the other hand the local backing
should probably be considered disposable, without warning, in a
production setup anyway. So some regular sweep of container states (if
that's important) should be made into images and put elsewhere.
Seriously, if the backing store were to faceplant, it's simply faster
to start from the most recent image than attempt repairs.

-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Cloud Atomic ISO, missing items for baremetal usage

2015-12-11 Thread Chris Murphy
Hi,
Is there a wish list for things to consider adding to a future ISO?
These are things not on the current ISO that often are on other
baremetal installs like Workstation and Server products. I don't have
enough familiarity with whether these things should just be included
in the base installation and those that could be gathered in one or
more "util" or "extras" type of Fedora docker image. So far I'm
running into:


/lib/fimware/ is read-only so I can't add this:
[   14.599501] iwlwifi :02:00.0: request for firmware file
'iwlwifi-7265D-13.ucode' failed.

I don't know whether a /var/lib/firmware being bind mounted to
/lib/firmware can be done soon enough that it'll be picked up by the
kernel.


pciutils, which contains lspci
hdparm
smartmontools
iotop

These I have running in a fedora container. lspci mostly works, but
getting full -vvnn detail requires --privileged=true. And the other
three require it. iotop additionally needs --net=host. I'd be OK with
them just being available in a container, but it might make more sense
to just include them in the atomic ISO installation, maybe even
borrowing a list from the Server product?






-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: root on btrfs and lvmthinp for docker backing confusion

2015-12-10 Thread Chris Murphy
On Thu, Dec 10, 2015 at 9:54 AM, Dusty Mabe <du...@dustymabe.com> wrote:

> So you created the VG VG and the docker-pool LV on your own before
> docker-storage-setup is run?

Yes.

> What I would do is leave sda4 blank and then put the following in your
> config file:
>
> DEVS=/dev/sda4
> VG=vgdocker

Fails with

Dec 10 11:10:13 f23a.localdomain docker-storage-setup[1135]: Partition
specification unsupported at this time.


> What I think this will do is create a PV out of /dev/sda4 and create a
> VG (named vgdocker) on top of it. It will then create the docker-pool
> LV for you and have the docker daemon use that as the backing store.
>
> Let me know if this is what you were looking for or not!

Yes.


-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: root on btrfs and lvmthinp for docker backing confusion

2015-12-10 Thread Chris Murphy
On Thu, Dec 10, 2015 at 11:47 AM, Jason Brooks <jbro...@redhat.com> wrote:

> You can ignore docker-storage-setup and edit /etc/sysconfig/docker-storage
> yourself.

OK I've done "systemctl disable docker-storage-setup"

>
> Here's what it looks like from the f23 vagrant box:
>
> DOCKER_STORAGE_OPTIONS=-s devicemapper --storage-opt dm.fs=xfs --storage-opt 
> dm.thinpooldev=/dev/mapper/atomicos-docker--pool --storage-opt 
> dm.use_deferred_removal=true


Short version: This works. Docker service starts, no errors, and it's
not using a loopback device.

Followup question: Does Docker directly use a thin pool without
creating (virtual size) logical volumes? Because I don't see any other
LV's created, and no XFS filesystem appears on the host using the
mount command. And yet I see XFS mount and umount kernel messages on
the host. This is sort of an esoteric question. However, I have no
access to container files from the host like I can see inside each
btrfs subvolume when btrfs is the backing method. And that suggests
possibly rather different backup strategies depending on the backing.


Long version of the question:


# systemctl stop docker

Copied generic docker-storage and edited as follows:

DOCKER_STORAGE_OPTIONS=-s devicemapper --storage-opt dm.fs=xfs
--storage-opt dm.thinpooldev=/dev/mapper/vgfedora-docker--pool
--storage-opt dm.use_deferred_removal=true

# systemctl start docker
Dec 10 13:05:02 f23a.localdomain systemd[1]: Starting Docker
Application Container Engine...
Dec 10 13:05:06 f23a.localdomain docker[1695]:
time="2015-12-10T13:05:06.377269077-07:00" level=info msg="Firewalld
running: false"
Dec 10 13:05:06 f23a.localdomain docker[1695]:
time="2015-12-10T13:05:06.791189262-07:00" level=info msg="Default
bridge (docker0) is assigned with an IP address 172.17.0.1/16. Daemon
option --bip can be used to set a preferred IP address"
Dec 10 13:05:07 f23a.localdomain docker[1695]:
time="2015-12-10T13:05:07.572415354-07:00" level=info msg="Loading
containers: start."
Dec 10 13:05:07 f23a.localdomain docker[1695]: ..
Dec 10 13:05:07 f23a.localdomain docker[1695]:
time="2015-12-10T13:05:07.576094399-07:00" level=info msg="Loading
containers: done."
Dec 10 13:05:07 f23a.localdomain docker[1695]:
time="2015-12-10T13:05:07.576663930-07:00" level=info msg="Daemon has
completed initialization"
Dec 10 13:05:07 f23a.localdomain docker[1695]:
time="2015-12-10T13:05:07.576770908-07:00" level=info msg="Docker
daemon" commit=f7c1d52-dirty execdriver=native-0.2
graphdriver=devicemapper version=1.9.1-fc23
Dec 10 13:05:07 f23a.localdomain docker[1695]:
time="2015-12-10T13:05:07.577004141-07:00" level=info msg="API listen
on /var/run/docker.sock"
Dec 10 13:05:07 f23a.localdomain systemd[1]: Started Docker
Application Container Engine.

host# docker pull fedora
host# docker images
REPOSITORY  TAG IMAGE IDCREATED
 VIRTUAL SIZE
fedora  latest  597717fc21bd2 weeks
ago 204 MB

So that all works, and then also the thin pool data% is growing after
each step, according to lvs. But there is no logical volume, no file
system.

host# docker run -i -t fedora /bin/bash

host# mount | grep xfs
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
host# mount | grep ext
/dev/sda3 on /boot type ext4 (rw,relatime,seclabel,stripe=4,data=ordered)

It's working. But is Docker directly using the thin pool without
creating a thin logical volume and file system? That's unexpected.

host# lvs
  LV  VG   Attr   LSize   Pool Origin Data%  Meta%
Move Log Cpy%Sync Convert
  docker-pool vgfedora twi-aotz-- 250.00g 0.11   0.53
host# docker pull ubuntu
[...snip pull output...]
-bash-4.3# lvs
  LV  VG   Attr   LSize   Pool Origin Data%  Meta%
Move Log Cpy%Sync Convert
  docker-pool vgfedora twi-aotz-- 250.00g 0.20   0.63

Yeah, it seems to directly use the thin pool without create a virtual
size LV first. Kernel messages show a bunch of items like this:

[ 7818.790685] XFS (dm-4): Mounting V5 Filesystem
[ 7819.038211] XFS (dm-4): Ending clean mount
[ 7834.219132] XFS (dm-4): Unmounting Filesystem

So it's using XFS on something, that isn't appearing in mount on the
host or in the container. And multiple containers all appear to use
XFS on the same device mapper device (dm-4) which does not appear on
the host in the /dev/ directory. So this is really... obscured.

-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: openQA nightly testing of Cloud Atomic installer image

2015-12-09 Thread Chris Murphy
On Tue, Dec 8, 2015 at 11:22 PM, Adam Williamson
<adamw...@fedoraproject.org> wrote:
> On Tue, 2015-12-08 at 11:26 -0700, Chris Murphy wrote:
>> On Mon, Dec 7, 2015 at 6:39 PM, Adam Williamson
>> <adamw...@fedoraproject.org> wrote:
>> > Hi folks! For those who aren't aware, Fedora openQA is set up to test
>> > the Atomic installer image nightly - that's the image that uses
>> > anaconda to deploy a fixed Atomic host payload.
>>
>> It exists! I've been looking around for such a thing for a day and
>> only found old blog posts. It's really non-obvious how to find it in
>> koji. I can find lives. I can find other atomics. But somehow this one
>> is like the G train in Brooklyn (OK the G *eventually* does show
>> itself).
>
> It's not built by koji, which is why you can't find it there. It's an
> installer image, it's built by pungi, just like netinst and DVD images.
>
> fedfind can find it, though. ;) That's what fedfind does! It finds
> fed(ora)!

I think I'd be helpful if the releng dashboard listed this build along
with the other cloud images. I estimate my recall half-life for
fedfind is about 15 days. Does anyone else think it'd be useful to
have the nightly atomic installer ISO listed at
https://apps.fedoraproject.org/releng-dash/ ? Or maybe even an Atomic
specific section on the dashboard?




>> > https://openqa.fedoraproject.org/ , you should see a build like
>> > 'Build23_Postrelease_20151207' on the front page,
>>
>> OK I click on this, it shows the test, and that it failed. Any chance
>> of it eventually linking to the image it tested so that it's possible
>> to fall back to a manual test
>
> So, answer one: it actually does. The ISO tested is on the Logs &
> Assets page, down at the bottom, under Assets.

When I click on Postrelease_20151209, I end up here
https://openqa.fedoraproject.org/tests/overview?distri=fedora=23=23_Postrelease_20151209=1

There isn't anything down at the bottom, definitely no Logs & Assets
page. Same with the other listings on the man openqa page...
https://drive.google.com/open?id=0B_2Asp8DGjJ9cUpnY2ZuY0lIV2M


>>  More likely is if it passes all auto tests to have a link to
>> the image so a manual test can try to blow it up, right?
>
> Eh, my take is that we don't/shouldn't exactly design our manual test
> processes around openQA. This testing (of the post-release nightly
> cloud images) is kind of a bonus thing I rigged up just because
> maxamillion asked and it wasn't too difficult; the main point of openQA
> is to aid in pre-release testing, and of course we have a more
> developed test process there, where we have the regular 'nomination' of
> nightly composes for manual testing, with the wiki pages with download
> links and all the rest of it. We could certainly stand to draw up a
> proper process for manual testing of post-release images, if we're
> going to be releasing them officially, which apparently we are, but I'm
> not the guy who's been keeping up on that stuff so I don't want to leap
> in, I'm sure some folks already have ideas for doing that.

Gotcha. Thanks!


-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: openQA nightly testing of Cloud Atomic installer image

2015-12-09 Thread Chris Murphy
On Wed, Dec 9, 2015 at 3:00 PM, Adam Williamson
<adamw...@fedoraproject.org> wrote:

> That's an overview page. The Logs & Assets tab is available for each
> individual *test* page. Here's the overview for today:
>
> https://openqa.fedoraproject.org/tests/overview?distri=fedora=23=23_Postrelease_20151209=1
>
> Click on the green dot and you see the individual test:

OOH - OK it's not at all obvious the dot is a link. Is it possible for
the test text itself having the link?



-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: openQA nightly testing of Cloud Atomic installer image

2015-12-09 Thread Chris Murphy
On Wed, Dec 9, 2015 at 2:02 PM, Joe Brockmeier <j...@redhat.com> wrote:
> On 12/09/2015 03:48 PM, Chris Murphy wrote:
>> Does anyone else think it'd be useful to
>> have the nightly atomic installer ISO listed at
>> https://apps.fedoraproject.org/releng-dash/ ? Or maybe even an Atomic
>> specific section on the dashboard?
>
> Yes! I totally think it would.

https://fedorahosted.org/fedora-infrastructure/ticket/5026

So that probably needs more clarity, now that I've already submitted
it. I asked for a Cloud specific section rather than Atomic specific.
So if Atomic specific is better organization, add that to the ticket.


-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: openQA nightly testing of Cloud Atomic installer image

2015-12-09 Thread Chris Murphy
On Wed, Dec 9, 2015 at 3:44 PM, Chris Murphy <li...@colorremedies.com> wrote:
> On Wed, Dec 9, 2015 at 3:00 PM, Adam Williamson
> <adamw...@fedoraproject.org> wrote:
>
>> That's an overview page. The Logs & Assets tab is available for each
>> individual *test* page. Here's the overview for today:
>>
>> https://openqa.fedoraproject.org/tests/overview?distri=fedora=23=23_Postrelease_20151209=1
>>
>> Click on the green dot and you see the individual test:
>
> OOH - OK it's not at all obvious the dot is a link. Is it possible for
> the test text itself having the link?

Or even the word "Details" to the right of the dot, which is the link.
-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: [cloud] #144: f23 atomic iso configures docker loopback storage

2015-12-09 Thread Chris Murphy
On Wed, Dec 9, 2015 at 12:26 PM, Fedora Cloud Trac Tickets
<cloud-t...@fedoraproject.org> wrote:
> #144: f23 atomic iso configures docker loopback storage
>  I looked over the kickstarts, but I'm not clear on what might be causing
>  this.
> Ticket URL: <https://fedorahosted.org/cloud/ticket/144>

I'm replying to list to avoid cluttering the ticket. I'm in the
vicinity of a related issue, trying to figure out what's doing the
setup, because I didn't use the prescribed automatic partitioning
layout, therefore don't have any lvmthinp stuff setup yet. The ISO
seems to create a system that's somehow making assumptions, but I
don't know where it's getting that info.

For example, I'm running into this:

● docker-storage-setup.service loaded failed failedDocker
Storage Setup
● docker.service  loaded failed failed
Docker Application Container Engine

The cause for storage setup service failing is
Dec 09 16:57:46 f23a.localdomain docker-storage-setup[759]: Volume
group "sda2" not found
Dec 09 16:57:46 f23a.localdomain docker-storage-setup[759]: Cannot
process volume group sda2
Dec 09 16:57:46 f23a.localdomain docker-storage-setup[759]: Metadata
volume docker-poolmeta already exists. Not creating a new one.
Dec 09 16:57:46 f23a.localdomain docker-storage-setup[759]: Please
provide a volume group name
Dec 09 16:57:46 f23a.localdomain docker-storage-setup[759]: Run
`lvcreate --help' for more information.

But where is it thinking there'd be a VG called sda2?

OK so I do
[chris@f23a ~]$ cat /usr/lib/systemd/system/docker-storage-setup.service
[Unit]
Description=Docker Storage Setup
After=cloud-final.service
Before=docker.service

[Service]
Type=oneshot
ExecStart=/usr/bin/docker-storage-setup
EnvironmentFile=-/etc/sysconfig/docker-storage-setup

[Install]
WantedBy=multi-user.target


That suggests looking at /etc/sysconfig/docker-storage-setup, which
does not exist (yet?). I also don't know the significance of the -
right after = in that line. There is a /etc/sysconfig/docker-storage
file that contains:

# This file may be automatically generated by an installation program.

# By default, Docker uses a loopback-mounted sparse file in
# /var/lib/docker.  The loopback makes it slower, and there are some
# restrictive defaults, such as 100GB max storage.

# If your installation did not set a custom storage for Docker, you
# may do it below.

# Example: Use a custom pair of raw logical volumes (one for metadata,
# one for data).
# DOCKER_STORAGE_OPTIONS = --storage-opt
dm.metadatadev=/dev/mylogvol/my-docker-metadata --storage-opt
dm.datadev=/dev/mylogvol/my-docker-data

DOCKER_STORAGE_OPTIONS=


So it might be that the Fedora 23 Atomic ISO behavior is the result of
upstream behavior, and Fedora 22 had a modifier that no longer exists
in Fedora 23?


-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: openQA nightly testing of Cloud Atomic installer image

2015-12-09 Thread Chris Murphy
On Wed, Dec 9, 2015 at 3:47 PM, Chris Murphy <li...@colorremedies.com> wrote:
> On Wed, Dec 9, 2015 at 2:02 PM, Joe Brockmeier <j...@redhat.com> wrote:
>> On 12/09/2015 03:48 PM, Chris Murphy wrote:
>>> Does anyone else think it'd be useful to
>>> have the nightly atomic installer ISO listed at
>>> https://apps.fedoraproject.org/releng-dash/ ? Or maybe even an Atomic
>>> specific section on the dashboard?
>>
>> Yes! I totally think it would.
>
> https://fedorahosted.org/fedora-infrastructure/ticket/5026
>
> So that probably needs more clarity, now that I've already submitted
> it. I asked for a Cloud specific section rather than Atomic specific.
> So if Atomic specific is better organization, add that to the ticket.

Fedfind finds these cloud specific products built nightly. The last
one, Docker, is in its own category, I guess. So what items in this
listing make sense to list in a hypothetical Cloud specific heading on
the releng dashboard? Or should it only list Atomic specific builds?

https://dl.fedoraproject.org/pub/alt/atomic/testing/23-20151209/Cloud-Images/i386/Images/Fedora-Cloud-Base-23-20151209.i386.qcow2
https://dl.fedoraproject.org/pub/alt/atomic/testing/23-20151209/Cloud-Images/i386/Images/Fedora-Cloud-Base-23-20151209.i386.raw.xz
https://dl.fedoraproject.org/pub/alt/atomic/testing/23-20151209/Cloud-Images/x86_64/Images/Fedora-Cloud-Atomic-23-20151209.x86_64.qcow2
https://dl.fedoraproject.org/pub/alt/atomic/testing/23-20151209/Cloud-Images/x86_64/Images/Fedora-Cloud-Atomic-23-20151209.x86_64.raw.xz
https://dl.fedoraproject.org/pub/alt/atomic/testing/23-20151209/Cloud-Images/x86_64/Images/Fedora-Cloud-Atomic-Vagrant-23-20151209.x86_64.vagrant-libvirt.box
https://dl.fedoraproject.org/pub/alt/atomic/testing/23-20151209/Cloud-Images/x86_64/Images/Fedora-Cloud-Atomic-Vagrant-23-20151209.x86_64.vagrant-virtualbox.box
https://dl.fedoraproject.org/pub/alt/atomic/testing/23-20151209/Cloud-Images/x86_64/Images/Fedora-Cloud-Base-23-20151209.x86_64.qcow2
https://dl.fedoraproject.org/pub/alt/atomic/testing/23-20151209/Cloud-Images/x86_64/Images/Fedora-Cloud-Base-23-20151209.x86_64.raw.xz
https://dl.fedoraproject.org/pub/alt/atomic/testing/23-20151209/Cloud-Images/x86_64/Images/Fedora-Cloud-Base-Vagrant-23-20151209.x86_64.vagrant-libvirt.box
https://dl.fedoraproject.org/pub/alt/atomic/testing/23-20151209/Cloud-Images/x86_64/Images/Fedora-Cloud-Base-Vagrant-23-20151209.x86_64.vagrant-virtualbox.box
https://dl.fedoraproject.org/pub/alt/atomic/testing/23-20151209/Cloud_Atomic/x86_64/iso/Fedora-Cloud_Atomic-x86_64-23-20151209.iso

https://dl.fedoraproject.org/pub/alt/atomic/testing/23-20151209/Docker/x86_64/Fedora-Docker-Base-23-20151209.x86_64.tar.xz


-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: openQA nightly testing of Cloud Atomic installer image

2015-12-08 Thread Chris Murphy
On Mon, Dec 7, 2015 at 6:39 PM, Adam Williamson
<adamw...@fedoraproject.org> wrote:
> Hi folks! For those who aren't aware, Fedora openQA is set up to test
> the Atomic installer image nightly - that's the image that uses
> anaconda to deploy a fixed Atomic host payload.

It exists! I've been looking around for such a thing for a day and
only found old blog posts. It's really non-obvious how to find it in
koji. I can find lives. I can find other atomics. But somehow this one
is like the G train in Brooklyn (OK the G *eventually* does show
itself).

> https://openqa.fedoraproject.org/ , you should see a build like
> 'Build23_Postrelease_20151207' on the front page,

OK I click on this, it shows the test, and that it failed. Any chance
of it eventually linking to the image it tested so that it's possible
to fall back to a manual test (?) I don't know if that's even useful.
If it fails the auto test, all that probably matters is the fail
summary. More likely is if it passes all auto tests to have a link to
the image so a manual test can try to blow it up, right?


-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
http://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org


Re: Two-Week Atomic actual deliverables

2015-09-11 Thread Chris Murphy
On Fri, Sep 11, 2015 at 10:59 AM, Adam Miller
<maxamill...@fedoraproject.org> wrote:

> I'm pretty neutral on B or C. I don't really care and also don't think it
> should even remotely be a concern of ours. Not only do we not have
> testing for it but we don't even have the building blocks in place to
> work towards testing it. VirtualBox is bad and those who use it should
> feel bad.[0]

I'm curious what you think others should feel when they use VMware
ESXi or Fusion, or Microsoft Hyper-V, in particular as it compares to
the feeling they should have when using VirtualBox?

On Windows and OS X, there is no qemu+kvm+libvirt. So I see VirtualBox
as the least bad option on those platforms. When I'm using Fedora I
use vmm/virsh because, well yeah VirtualBox is like the booger I can't
flick off on OS X, meanwhile on Fedora there's something better.


> This is probably not a popular opinion and I'm fine with that, but we
> would have to install something that we very publicly speak out
> against in order to test this. I'm not yet ready to throw out Fedora's
> values for the sake of some OS X user's convenience but that's just
> me.

OK well considering the UX of Linux on Macs is highly variable between
totally utterly frustrating shit, and semi-tolerable except for
exhibits A, B, C, and D all of which suck. The incentive, therefore,
is to just run proprietary OS X on proprietary hardware and VirtualBox
instead of yet more proprietary crap in order to semi-sanely run
something that's not crap or proprietary without having to buy
additional hardware and all the costs that ensue.

*shrug*

It's sorta like playing cards and telling someone they should feel bad
about the hand they've been dealt. Their choice was really limited to
showing up at a particular game in a particular location, not the
details of the hand they're dealt.



-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/cloud
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct


Cloud (_Atomic) selinux labels and restorecon

2015-09-01 Thread Chris Murphy
FYI:
restorecon changes many file labels following a clean install
https://bugzilla.redhat.com/show_bug.cgi?id=1259018

This bug is not Cloud specific, but because Cloud_Atomic is read-only
it can't be fixed with restorecon. I mention this in the bug.

I don't know the quantity of metadata changes: selinux policy,
permissions, all other xattr, happen in the course of a release; but
in an "Atomic" context it looks like only option is to duplicate the
affected files to uniquely set new metadata on just that file in a
particular tree. The alternative, changing the metadata on the
hardlink, punches through to the original file in a completely
different tree, affecting all trees, and is therefore not atomic. (On
Btrfs this duplication can be made efficient with reflinks instead of
hardlinks, but that's trivia.)


-- 
Chris Murphy
___
cloud mailing list
cloud@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/cloud
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct


  1   2   >