Re: [Nix-dev] NixOS won't boot anymore in certain generations, don't know why (Stage-1 error)

2015-06-04 Thread Matthias Beyer
Hi,

I found and solved the issue.

So first, I checked the kernel versions in my revisions:

106 - 3.18.11
107 - 3.18.13
108 - 3.18.13
109 - 3.18.13
110 - 3.18.13
111 - 3.18.13
112 - 3.18.13

(the latest two were other test builds).

I guessed that the issue was the kernel ... because nothing else
changed. There was no channel update and I didn't fiddle around in the
configuration for the system (at least not in these parts, only
packages and containers a bit).

So, I got my local clone of the nixpkgs repo onto the commit which was
used to build generation 106:

2d8cfe76a9e4f05e391d30f1654d45dee5993b8a

And rebuild the system. This worked. Reboot worked too, so the new
revision, 113, is now on kernel 3.18.11 again.

I then tried to change kernel versions to 4.0 and 3.19 (in this order,
because newer is better), but both failed to build because the ati
driver can't be build because of _compiler errors_ (seriousely, what
the hell?)

So, I'm back on 3.18.11. Unfortunately, I have own patches because I
have to wait for the channel update. So I re-applied my patches upon
mentioned commit:

gitolite v3.6.2 - v3.6.3
snort: version fix (fixes download error)
daq: version fix (fixes download error)

these are non-published commits.

---

That's it. While I don't have any concerns about my machine, because
Nix is great and everything and I can boot in old versions of the
system and fix everything and so on, I really can't consider this as
great user experience. Debugging the issue was not that complicated
(despite my little hangover), but unneccessary. I'd really like to see
this situation improved. And I'd really like to see kernel 4.0 on my
machine, as I'm waiting for the AMD graphic cards fixes which are in
4.0.

I mean, I build my system from my own clone of nixpkgs. Not the way I
want to:

nix-channel --update
nixos-rebuild switch

which should be used. No, I have to

git checkout commit
nixos-rebuild switch -I nixpkgs=~/my/clone/of/nixpkgs
# build fail
# try again other commit

I don't want to blame anyone for this or something, this mail is
mainly for documenting the issue, but hey, these things really
shouldn't happen, right?

So, to close this here and now... I hope I see some of you guys next
weekend at tuebix in Tübingen, Germany!

On 04-06-2015 15:06:51, Matthias Beyer wrote:
 Hi,
 
 I have a problem with some of my generations.
 
 Today (, after installing chromium, but I don't think this has
 anything to do with it), I noticed that my xterms got redrawn after
 switching to them (I'm using i3, so if a window is not shown but I
 bring it up, a redraw happened).
 
 So, if I switched to a xterm where I executed something like
 alsamixer or tree /tmp, it got redrawn line by line.
 
 I did not understand why this happened, but I guessed it was some
 driver problem, where a driver went bad or something. So I decided to
 reboot.
 
 When rebooting, I booted into my newest generation, which was 109 by
 this time. But I got an error in stage 1, telling me that my root
 partition couldn't be mounted as the device did not come up (LUKS
 encrypted SSD, root on /dev/sda2). It asked me
 
 dm_mod loaded?
 
 So I tried previous generations and had success with 106 (107-109) did
 not work. I checked my config, and I saw:
 
 boot.initrd.kernelModules = [ fbcon ext4 dm_crypt ];
 
 that dm_mod was missing, indeed. So I changed it to:
 
 boot.initrd.kernelModules = [ fbcon ext4 dm_mod dm_crypt ];
 
 And rebuild the system, resulting in generation 110. I tried to boot
 that, but the same error happened.
 
 I'm on kernel 3_18_4, if this matters.
 
 So my problem is, I don't know what went wrong and how to fix it.
 Unfortunately, I don't know which configuration I build generation 106
 from (my config is git-tracked). I'd show you the diff of my
 generation, but well... I don't know which revision it was.
 
 How can I debug this and more important: How can I fix this?
 
 -- 
 Mit freundlichen Grüßen,
 Kind regards,
 Matthias Beyer
 
 Proudly sent with mutt.
 Happily signed with gnupg.



 ___
 nix-dev mailing list
 nix-dev@lists.science.uu.nl
 http://lists.science.uu.nl/mailman/listinfo/nix-dev


-- 
Mit freundlichen Grüßen,
Kind regards,
Matthias Beyer

Proudly sent with mutt.
Happily signed with gnupg.


pgp10Rw_5TXDT.pgp
Description: PGP signature
___
nix-dev mailing list
nix-dev@lists.science.uu.nl
http://lists.science.uu.nl/mailman/listinfo/nix-dev


Re: [Nix-dev] NixOS won't boot anymore in certain generations, don't know why (Stage-1 error)

2015-06-04 Thread Linus Arver
Hello Matthias,

 When rebooting, I booted into my newest generation, which was 109 by
 this time. But I got an error in stage 1, telling me that my root
 partition couldn't be mounted as the device did not come up (LUKS
 encrypted SSD, root on /dev/sda2). It asked me
 
 dm_mod loaded?

FWIW, this is probably relevant:
http://lists.science.uu.nl/pipermail/nix-dev/2015-May/017198.html

. I was hit with that regression some weeks ago and had to cherry-pick
the bugfix commit from Nixpkgs.
___
nix-dev mailing list
nix-dev@lists.science.uu.nl
http://lists.science.uu.nl/mailman/listinfo/nix-dev


Re: [Nix-dev] NixOS won't boot anymore in certain generations, don't know why (Stage-1 error)

2015-06-04 Thread Matthias Beyer
Just to append,

the mentioned issue with the redrawing of the terminals is back again.

On 04-06-2015 16:07:14, Matthias Beyer wrote:
 Hi,
 
 I found and solved the issue.
 
 So first, I checked the kernel versions in my revisions:
 
 106 - 3.18.11
 107 - 3.18.13
 108 - 3.18.13
 109 - 3.18.13
 110 - 3.18.13
 111 - 3.18.13
 112 - 3.18.13
 
 (the latest two were other test builds).
 
 I guessed that the issue was the kernel ... because nothing else
 changed. There was no channel update and I didn't fiddle around in the
 configuration for the system (at least not in these parts, only
 packages and containers a bit).
 
 So, I got my local clone of the nixpkgs repo onto the commit which was
 used to build generation 106:
 
 2d8cfe76a9e4f05e391d30f1654d45dee5993b8a
 
 And rebuild the system. This worked. Reboot worked too, so the new
 revision, 113, is now on kernel 3.18.11 again.
 
 I then tried to change kernel versions to 4.0 and 3.19 (in this order,
 because newer is better), but both failed to build because the ati
 driver can't be build because of _compiler errors_ (seriousely, what
 the hell?)
 
 So, I'm back on 3.18.11. Unfortunately, I have own patches because I
 have to wait for the channel update. So I re-applied my patches upon
 mentioned commit:
 
 gitolite v3.6.2 - v3.6.3
 snort: version fix (fixes download error)
 daq: version fix (fixes download error)
 
 these are non-published commits.
 
 ---
 
 That's it. While I don't have any concerns about my machine, because
 Nix is great and everything and I can boot in old versions of the
 system and fix everything and so on, I really can't consider this as
 great user experience. Debugging the issue was not that complicated
 (despite my little hangover), but unneccessary. I'd really like to see
 this situation improved. And I'd really like to see kernel 4.0 on my
 machine, as I'm waiting for the AMD graphic cards fixes which are in
 4.0.
 
 I mean, I build my system from my own clone of nixpkgs. Not the way I
 want to:
 
 nix-channel --update
 nixos-rebuild switch
 
 which should be used. No, I have to
 
 git checkout commit
 nixos-rebuild switch -I nixpkgs=~/my/clone/of/nixpkgs
 # build fail
 # try again other commit
 
 I don't want to blame anyone for this or something, this mail is
 mainly for documenting the issue, but hey, these things really
 shouldn't happen, right?
 
 So, to close this here and now... I hope I see some of you guys next
 weekend at tuebix in Tübingen, Germany!
 
 On 04-06-2015 15:06:51, Matthias Beyer wrote:
  Hi,
  
  I have a problem with some of my generations.
  
  Today (, after installing chromium, but I don't think this has
  anything to do with it), I noticed that my xterms got redrawn after
  switching to them (I'm using i3, so if a window is not shown but I
  bring it up, a redraw happened).
  
  So, if I switched to a xterm where I executed something like
  alsamixer or tree /tmp, it got redrawn line by line.
  
  I did not understand why this happened, but I guessed it was some
  driver problem, where a driver went bad or something. So I decided to
  reboot.
  
  When rebooting, I booted into my newest generation, which was 109 by
  this time. But I got an error in stage 1, telling me that my root
  partition couldn't be mounted as the device did not come up (LUKS
  encrypted SSD, root on /dev/sda2). It asked me
  
  dm_mod loaded?
  
  So I tried previous generations and had success with 106 (107-109) did
  not work. I checked my config, and I saw:
  
  boot.initrd.kernelModules = [ fbcon ext4 dm_crypt ];
  
  that dm_mod was missing, indeed. So I changed it to:
  
  boot.initrd.kernelModules = [ fbcon ext4 dm_mod dm_crypt ];
  
  And rebuild the system, resulting in generation 110. I tried to boot
  that, but the same error happened.
  
  I'm on kernel 3_18_4, if this matters.
  
  So my problem is, I don't know what went wrong and how to fix it.
  Unfortunately, I don't know which configuration I build generation 106
  from (my config is git-tracked). I'd show you the diff of my
  generation, but well... I don't know which revision it was.
  
  How can I debug this and more important: How can I fix this?
  
  -- 
  Mit freundlichen Grüßen,
  Kind regards,
  Matthias Beyer
  
  Proudly sent with mutt.
  Happily signed with gnupg.
 
 
 
  ___
  nix-dev mailing list
  nix-dev@lists.science.uu.nl
  http://lists.science.uu.nl/mailman/listinfo/nix-dev
 
 
 -- 
 Mit freundlichen Grüßen,
 Kind regards,
 Matthias Beyer
 
 Proudly sent with mutt.
 Happily signed with gnupg.



 ___
 nix-dev mailing list
 nix-dev@lists.science.uu.nl
 http://lists.science.uu.nl/mailman/listinfo/nix-dev


-- 
Mit freundlichen Grüßen,
Kind regards,
Matthias Beyer

Proudly sent with mutt.
Happily signed with gnupg.


pgpChUNitEvGV.pgp
Description: PGP signature

Re: [Nix-dev] NixOS won't boot anymore in certain generations, don't know why (Stage-1 error)

2015-06-04 Thread Jascha Geerds
https://github.com/NixOS/nixpkgs/issues/7859 ?


-- 
  Jascha Geerds
  j...@ekby.de
___
nix-dev mailing list
nix-dev@lists.science.uu.nl
http://lists.science.uu.nl/mailman/listinfo/nix-dev


Re: [Nix-dev] NixOS won't boot anymore in certain generations, don't know why (Stage-1 error)

2015-06-04 Thread Matthias Beyer
On 04-06-2015 16:16:27, Jascha Geerds wrote:
 https://github.com/NixOS/nixpkgs/issues/7859 ?
 

Thanks, that helped a quite bit.

-- 
Mit freundlichen Grüßen,
Kind regards,
Matthias Beyer

Proudly sent with mutt.
Happily signed with gnupg.


pgp14yBqYDkS5.pgp
Description: PGP signature
___
nix-dev mailing list
nix-dev@lists.science.uu.nl
http://lists.science.uu.nl/mailman/listinfo/nix-dev


Re: [Nix-dev] NixOS won't boot anymore in certain generations, don't know why (Stage-1 error)

2015-06-04 Thread Marc Weber
I also have a problem with luks - didn't debug it yet.
Some lines appear saying that target link already exist.

I can boot this way:

in grub hit e, add boot.shell_on_fail or shell_on_fail,
then you can debug the mounting / boot process, eg enter interactive
shell, then you can mount your partition manually on /root* something

After exiting I can boot. I don't restart often so I didn't investigate.

A google search told me it could be a race condition.

Thus have a look at the shell_on_fail within stage-1 and stage-1 init
scripts to learn more about the startup process - then you can debug.

Marc Weber

___
nix-dev mailing list
nix-dev@lists.science.uu.nl
http://lists.science.uu.nl/mailman/listinfo/nix-dev


Re: [Nix-dev] NixOS won't boot anymore in certain generations, don't know why (Stage-1 error)

2015-06-04 Thread Matthias Beyer
On 04-06-2015 14:15:12, Marc Weber wrote:
 I can boot this way:
 
 in grub hit e, add boot.shell_on_fail or shell_on_fail,
 then you can debug the mounting / boot process, eg enter interactive
 shell, then you can mount your partition manually on /root* something
 
 After exiting I can boot. I don't restart often so I didn't investigate.
 
 A google search told me it could be a race condition.
 
 Thus have a look at the shell_on_fail within stage-1 and stage-1 init
 scripts to learn more about the startup process - then you can debug.

Also not really user friendly, is it?

Anyways, I'm not using grub, I'm using gummiboot and I don't know how
to enter the interactive shell with it. Though I got it working,
right... so everything is fine again.

-- 
Mit freundlichen Grüßen,
Kind regards,
Matthias Beyer

Proudly sent with mutt.
Happily signed with gnupg.


pgpD7MY9SfpBg.pgp
Description: PGP signature
___
nix-dev mailing list
nix-dev@lists.science.uu.nl
http://lists.science.uu.nl/mailman/listinfo/nix-dev


[Nix-dev] NixOS won't boot anymore in certain generations, don't know why (Stage-1 error)

2015-06-04 Thread Matthias Beyer
Hi,

I have a problem with some of my generations.

Today (, after installing chromium, but I don't think this has
anything to do with it), I noticed that my xterms got redrawn after
switching to them (I'm using i3, so if a window is not shown but I
bring it up, a redraw happened).

So, if I switched to a xterm where I executed something like
alsamixer or tree /tmp, it got redrawn line by line.

I did not understand why this happened, but I guessed it was some
driver problem, where a driver went bad or something. So I decided to
reboot.

When rebooting, I booted into my newest generation, which was 109 by
this time. But I got an error in stage 1, telling me that my root
partition couldn't be mounted as the device did not come up (LUKS
encrypted SSD, root on /dev/sda2). It asked me

dm_mod loaded?

So I tried previous generations and had success with 106 (107-109) did
not work. I checked my config, and I saw:

boot.initrd.kernelModules = [ fbcon ext4 dm_crypt ];

that dm_mod was missing, indeed. So I changed it to:

boot.initrd.kernelModules = [ fbcon ext4 dm_mod dm_crypt ];

And rebuild the system, resulting in generation 110. I tried to boot
that, but the same error happened.

I'm on kernel 3_18_4, if this matters.

So my problem is, I don't know what went wrong and how to fix it.
Unfortunately, I don't know which configuration I build generation 106
from (my config is git-tracked). I'd show you the diff of my
generation, but well... I don't know which revision it was.

How can I debug this and more important: How can I fix this?

-- 
Mit freundlichen Grüßen,
Kind regards,
Matthias Beyer

Proudly sent with mutt.
Happily signed with gnupg.


pgplXckNCmEXm.pgp
Description: PGP signature
___
nix-dev mailing list
nix-dev@lists.science.uu.nl
http://lists.science.uu.nl/mailman/listinfo/nix-dev