Re: [Xen-devel] Debian Kernel: Xen-fb-frontend as a module?

2014-09-23 Thread Konrad Rzeszutek Wilk
On Tue, Sep 23, 2014 at 04:57:26PM +0100, Ian Campbell wrote:
 On Tue, 2014-09-23 at 16:43 +0100, David Vrabel wrote:
  On 23/09/14 15:30, Ian Campbell wrote:
   create !
   title it 30s delay loading xenfb driver on some systems
   owner it Konrad Rzeszutek Wilk konrad.w...@oracle.com
   thanks
   
   Hi James,
   
   Some of the other Xen devs were discussing an issue which sounded
   awfully similar to this one, so I am copying the xen-devel list and
   creating a Xen bug to track the issue, please CC xen-devel (no need to
   subscribe but you may be moderated the first time), since the tracker
   slurps mails from the list.
   
   I'm not sure of the details of the other issue but it involved
   http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b1a3b1c8a8d963424c4699efa64dd8986b2f76d7
hopefully Konrad or one of the others will follow up.

That was with an HVM guest running under Xen 4.1 in which this
guest config was used:

vfb = ['type=vnc,vncunused=1,vnclisten=0.0.0.0']

Xend would create an XenStore keys for the PV framebuffer and also making
sure that QEMU VGA driver was running. The end result was that the guest
would boot up to Xorg VGA driver, but the frame buffer console (so from
the moment GRUB2 started Linux up to Xorg) would try to use the xen-fbfront.

And since this is HVM guest and VNCviewer was slurping contents from the
VGA buffer - which was not used at all - we wouldn't get anything.

Reverting the above patch fixed the issue.
   
   For xen-devel, the first two mails in this thread are
   https://lists.debian.org/debian-kernel/2014/09/msg00229.html and 
   https://lists.debian.org/debian-kernel/2014/09/msg00233.html
  
  The wait stuff for xenbus devices looks like pre-dates distros handling
  asynchronous devices with suitable initrds etc.
 
 Perhaps, but I think most distros still don't use rootwait by default so
 unless / turns up reasonably promptly (single digit seconds) the boot
 may fail.
 
  I think we need a command line configurable white list of device types
  to wait for.  The default should be (to match what was done historically):
  
  PV: vbd, vif, pci, vfb, vkbd.
  HVM: vbd, vif.
  
  I also think it should be possible to default (via a config option) to
  an empty white list.
 
 Irrespective of the above this seems like a reasonable enough idea to
 me.

What about ARM?
 
 Ian.
 
 
 ___
 Xen-devel mailing list
 xen-de...@lists.xen.org
 http://lists.xen.org/xen-devel


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140923191458.gz3...@laptop.dumpdata.com



Re: [Xen-devel] Current state of Xen microcode (Was: Re: uploading 3.16~rc5-1~exp1)

2014-08-04 Thread Konrad Rzeszutek Wilk
On Wed, Jul 23, 2014 at 03:59:37PM +0100, Jan Beulich wrote:
  On 15.07.14 at 16:59, konrad.w...@oracle.com wrote:
  On Tue, Jul 15, 2014 at 09:53:32AM +0100, Ian Campbell wrote:
  Adding xen-devel and some of the Linux maints,
  
  On Mon, 2014-07-14 at 23:22 +0200, maximilian attems wrote:
   I will upload tomorrow Tuesday around 22h00 UT to experimental.
   
   There are two TODOS concerning the not yet forwarded Debian patches:
   - cgroups
   - xen microcode
   
   
   I consider both not a blocker, but would be happy if Xen guys could
   have a look for what is needed.
  
  The xen microcode patches which maks is referring to are these:
  
  http://anonscm.debian.org/viewvc/kernel/dists/trunk/linux/debian/patches/feat
   
  ures/all/xen/
  which are a forward port of Jeremy's old microcode_xen.ko driver.
  
  They are also at my branch
  http://git.kernel.org/cgit/linux/kernel/git/konrad/xen.git/log/?h=stable/mis
   
  c
  
  Hm, I should rebase them at some point.
  
  I've not been keeping up on Xen x86 microcode stuff these days but I
  think we don't need this any more with modern Xen since we can parse the
  microcode blob off the front of the initrd, is that right?
  
  Right. And best of it, the support for that is in dracut so
  it automatically can happen (thought you still need to add
  'ucode=scan' in the /etc/default/grub.cfg in the GRUB_CMDLINE_XEN
  parameter) and also in /etc/dracut.conf add 'early_microcode=yes'.
 
 Except that all this still doesn't take care of updating microcode at
 runtime. Yet that should - like kexec - be implemented in the
 respective tools such that no kernel involvement it required.

What are such tools? My recollection is that it is matter of
sticking the files in /lib/firmware and then udev - will try
at the detection of an CPU becoming online, do:
 - load microcode module
 - echo 1  to the 'reload' sysfs parameter

Perhaps the solution is simple:
 - write an xen userspace tool that will do this loading via
   the hypercalls.
 - plug said tool in the udev script that comes with xen.
 - distros will install said udev scripts when they install Xen rpms?

That sounds too simple to be true. I must be missing something :-)
 
 Jan
 


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140804140730.gd18...@laptop.dumpdata.com



Re: [Xen-devel] Current state of Xen microcode (Was: Re: uploading 3.16~rc5-1~exp1)

2014-08-04 Thread Konrad Rzeszutek Wilk
On Mon, Aug 04, 2014 at 03:40:45PM +0100, Jan Beulich wrote:
  On 04.08.14 at 16:07, konrad.w...@oracle.com wrote:
  On Wed, Jul 23, 2014 at 03:59:37PM +0100, Jan Beulich wrote:
   On 15.07.14 at 16:59, konrad.w...@oracle.com wrote:
   On Tue, Jul 15, 2014 at 09:53:32AM +0100, Ian Campbell wrote:
   Adding xen-devel and some of the Linux maints,
   
   On Mon, 2014-07-14 at 23:22 +0200, maximilian attems wrote:
I will upload tomorrow Tuesday around 22h00 UT to experimental.

There are two TODOS concerning the not yet forwarded Debian patches:
- cgroups
- xen microcode


I consider both not a blocker, but would be happy if Xen guys could
have a look for what is needed.
   
   The xen microcode patches which maks is referring to are these:
   
   
  http://anonscm.debian.org/viewvc/kernel/dists/trunk/linux/debian/patches/feat
   
  
   ures/all/xen/
   which are a forward port of Jeremy's old microcode_xen.ko driver.
   
   They are also at my branch
   
  http://git.kernel.org/cgit/linux/kernel/git/konrad/xen.git/log/?h=stable/mis
   
   c
   
   Hm, I should rebase them at some point.
   
   I've not been keeping up on Xen x86 microcode stuff these days but I
   think we don't need this any more with modern Xen since we can parse the
   microcode blob off the front of the initrd, is that right?
   
   Right. And best of it, the support for that is in dracut so
   it automatically can happen (thought you still need to add
   'ucode=scan' in the /etc/default/grub.cfg in the GRUB_CMDLINE_XEN
   parameter) and also in /etc/dracut.conf add 'early_microcode=yes'.
  
  Except that all this still doesn't take care of updating microcode at
  runtime. Yet that should - like kexec - be implemented in the
  respective tools such that no kernel involvement it required.
  
  What are such tools?
 
 Wasn't there a microcode_ctl tool/package?

There was but it is obsolete:

***
 What it does
***
Deploy an Intel and AMD microcode. This tool is obsolete and the microcode
is the subject to be distributed via kernel-firmware, however Intel still
does not supply the microcode in a form consumable by the Linux's microcode

 
  My recollection is that it is matter of
  sticking the files in /lib/firmware and then udev - will try
  at the detection of an CPU becoming online, do:
   - load microcode module
   - echo 1  to the 'reload' sysfs parameter
  
  Perhaps the solution is simple:
   - write an xen userspace tool that will do this loading via
 the hypercalls.
   - plug said tool in the udev script that comes with xen.
   - distros will install said udev scripts when they install Xen rpms?
  
  That sounds too simple to be true. I must be missing something :-)
 
 That would be an option too (namely if, say, the microcode_ctl tool
 is considered deprecated/dead), and indeed can't be very difficult.

Vacation does wonders for ones brain! I shall put it on the TODO list.
thank you!
 
 Jan
 


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140804150248.gd19...@laptop.dumpdata.com



Re: Current state of Xen microcode (Was: Re: uploading 3.16~rc5-1~exp1)

2014-07-15 Thread Konrad Rzeszutek Wilk
On Tue, Jul 15, 2014 at 09:53:32AM +0100, Ian Campbell wrote:
 Adding xen-devel and some of the Linux maints,
 
 On Mon, 2014-07-14 at 23:22 +0200, maximilian attems wrote:
  I will upload tomorrow Tuesday around 22h00 UT to experimental.
  
  There are two TODOS concerning the not yet forwarded Debian patches:
  - cgroups
  - xen microcode
  
  
  I consider both not a blocker, but would be happy if Xen guys could
  have a look for what is needed.
 
 The xen microcode patches which maks is referring to are these:
 http://anonscm.debian.org/viewvc/kernel/dists/trunk/linux/debian/patches/features/all/xen/
 which are a forward port of Jeremy's old microcode_xen.ko driver.

They are also at my branch
http://git.kernel.org/cgit/linux/kernel/git/konrad/xen.git/log/?h=stable/misc

Hm, I should rebase them at some point.
 
 I've not been keeping up on Xen x86 microcode stuff these days but I
 think we don't need this any more with modern Xen since we can parse the
 microcode blob off the front of the initrd, is that right?

Right. And best of it, the support for that is in dracut so
it automatically can happen (thought you still need to add
'ucode=scan' in the /etc/default/grub.cfg in the GRUB_CMDLINE_XEN
parameter) and also in /etc/dracut.conf add 'early_microcode=yes'.

 
 (this is predicated on Xen 4.4 hitting unstable, which is underway)

unstable == Debian unstable?

 
 Thanks,
 Ian.
 


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140715145934.gm3...@laptop.dumpdata.com



Re: [Xen-devel] Current state of Xen microcode (Was: Re: uploading 3.16~rc5-1~exp1)

2014-07-15 Thread Konrad Rzeszutek Wilk
On Tue, Jul 15, 2014 at 04:06:00PM +0100, Ian Campbell wrote:
 On Tue, 2014-07-15 at 10:59 -0400, Konrad Rzeszutek Wilk wrote:
  On Tue, Jul 15, 2014 at 09:53:32AM +0100, Ian Campbell wrote:
   Adding xen-devel and some of the Linux maints,
   
   On Mon, 2014-07-14 at 23:22 +0200, maximilian attems wrote:
I will upload tomorrow Tuesday around 22h00 UT to experimental.

There are two TODOS concerning the not yet forwarded Debian patches:
- cgroups
- xen microcode


I consider both not a blocker, but would be happy if Xen guys could
have a look for what is needed.
   
   The xen microcode patches which maks is referring to are these:
   http://anonscm.debian.org/viewvc/kernel/dists/trunk/linux/debian/patches/features/all/xen/
   which are a forward port of Jeremy's old microcode_xen.ko driver.
  
  They are also at my branch
  http://git.kernel.org/cgit/linux/kernel/git/konrad/xen.git/log/?h=stable/misc
  
  Hm, I should rebase them at some point.
   
   I've not been keeping up on Xen x86 microcode stuff these days but I
   think we don't need this any more with modern Xen since we can parse the
   microcode blob off the front of the initrd, is that right?
  
  Right. And best of it, the support for that is in dracut so
  it automatically can happen (thought you still need to add
  'ucode=scan' in the /etc/default/grub.cfg in the GRUB_CMDLINE_XEN
  parameter) and also in /etc/dracut.conf add 'early_microcode=yes'.
 
 Hrm, I thought it was transparent. Debian doesn't use dracut, it uses
 initramfs-tools. So perhaps there is some work to be done here.
 
 I understand ucode=scan, but what does the dracut option do?

Packages the blobs and sticks them to the initramfs.

I think 'initframfs-tools' will need some code there. Here is what
I did in dracut:

http://git.kernel.org/cgit/boot/dracut/dracut.git/commit/?id=5f2c30d9bcd614d546d5c55c6897e33f88b9ab90

http://git.kernel.org/cgit/boot/dracut/dracut.git/commit/?id=b5b608e44ade93bee54d274f5edc6aad6dc45288

http://git.kernel.org/cgit/boot/dracut/dracut.git/commit/?id=d8b04dc1840047a7533d19f577f30f19d42e2d33
 
   (this is predicated on Xen 4.4 hitting unstable, which is underway)
  
  unstable == Debian unstable?
 
 Yes.
 
 Ian.
 


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140715153500.gw3...@laptop.dumpdata.com



Re: [Xen-devel] Current state of Xen microcode (Was: Re: uploading 3.16~rc5-1~exp1)

2014-07-15 Thread Konrad Rzeszutek Wilk
On Tue, Jul 15, 2014 at 06:53:04PM +0100, Ben Hutchings wrote:
 On Tue, 2014-07-15 at 11:35 -0400, Konrad Rzeszutek Wilk wrote:
  On Tue, Jul 15, 2014 at 04:06:00PM +0100, Ian Campbell wrote:
   On Tue, 2014-07-15 at 10:59 -0400, Konrad Rzeszutek Wilk wrote:
On Tue, Jul 15, 2014 at 09:53:32AM +0100, Ian Campbell wrote:
 Adding xen-devel and some of the Linux maints,
 
 On Mon, 2014-07-14 at 23:22 +0200, maximilian attems wrote:
  I will upload tomorrow Tuesday around 22h00 UT to experimental.
  
  There are two TODOS concerning the not yet forwarded Debian patches:
  - cgroups
  - xen microcode
  
  
  I consider both not a blocker, but would be happy if Xen guys could
  have a look for what is needed.
 
 The xen microcode patches which maks is referring to are these:
 http://anonscm.debian.org/viewvc/kernel/dists/trunk/linux/debian/patches/features/all/xen/
 which are a forward port of Jeremy's old microcode_xen.ko driver.

They are also at my branch
http://git.kernel.org/cgit/linux/kernel/git/konrad/xen.git/log/?h=stable/misc

Hm, I should rebase them at some point.
 
 I've not been keeping up on Xen x86 microcode stuff these days but I
 think we don't need this any more with modern Xen since we can parse 
 the
 microcode blob off the front of the initrd, is that right?

Right. And best of it, the support for that is in dracut so
it automatically can happen (thought you still need to add
'ucode=scan' in the /etc/default/grub.cfg in the GRUB_CMDLINE_XEN
parameter) and also in /etc/dracut.conf add 'early_microcode=yes'.
   
   Hrm, I thought it was transparent. Debian doesn't use dracut, it uses
   initramfs-tools. So perhaps there is some work to be done here.
   
   I understand ucode=scan, but what does the dracut option do?
  
  Packages the blobs and sticks them to the initramfs.
  
  I think 'initframfs-tools' will need some code there.
 [...]
 
 initramfs-tools can already prepend microcode in the way Linux expects.
 I assume Xen supports the same format (it's a little weird for it to be
 parsing the dom0 initrd, but OK...).  But we're presumably missing the

It scans all of the binary blobs that are attached looking for the
proper cpio signature. So you could do it as a seperate cpio binary.

Or a binary blob (cat /lib/firmware/*ucode/*  /boot/multiboot.bin)
and the 'multiboot.bin' can be part of the multiboot stanza:
module /multiboot.bin

And you add 'ucode=-1' for it to take the last module and use that
as its source of microcode code.

However, the 'ucode=scan' is more in line with what Linux has with
the early microcode loading so you are better of using that and
it will be less work.
 GRUB integration to get 'ucode=scan' added automatically.

nods
 
 Ben.
 
 -- 
 Ben Hutchings
 Hoare's Law of Large Problems:
 Inside every large problem is a small problem struggling to get out.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140715180821.gd2...@laptop.dumpdata.com



Re: [Xen-devel] pvops microcode support for AMD FAM = 15

2012-12-19 Thread Konrad Rzeszutek Wilk
On Thu, Dec 06, 2012 at 08:34:31AM +, Ian Campbell wrote:
 (trim quote please...)
 On Wed, 2012-12-05 at 21:47 +, Konrad Rzeszutek Wilk wrote:
  Do you want to prep a patch that I can stick in my 'microcode' branch?
  .. That I will at some point try to upstream.
 
 You might want to look back at the archives when Jeremy first tried to
 upstream this work, it was a vehement No and the resulting thread was
 not pretty.
 
 Now that we have early loading via the hypervisor in 4.2 and Linux is
 finally in the process of growing its own early microcode loading
 solution I suspect the No would be even firmer.
 
 It is on xenbits if you want it anyway:
 
 git://xenbits.xen.org/people/ianc/linux-2.6.git debian/wheezy/microcode

Thx. Pulled it in my stable/misc branch.
 
 About the only argument I can see for continuing to try upstreaming this
 stuff is that in
 http://www.gossamer-threads.com/lists/linux/kernel/1583630 Fenghua says:
 
 Note, however, that Linux users have gotten used to being able
 to install a microcode patch in the field without having a
 reboot; we support that model too.
 
 i.e. this is an argument for keeping the previous scheme in parallel,
 which I suppose is an argument for supporting the same under Xen (I
 don't know if its a good one though.
 
 Ian.
 
 -- 
 Ian Campbell
 
 
 All the existing 2.0.x kernels are to buggy for 2.1.x to be the
 main goal.
   -- Alan Cox
 


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20121219202828.gl15...@phenom.dumpdata.com



Re: [Xen-devel] pvops microcode support for AMD FAM = 15

2012-12-05 Thread Konrad Rzeszutek Wilk
On Wed, Dec 05, 2012 at 12:46:39PM +, Ian Campbell wrote:
 On Mon, 2012-11-26 at 13:44 +, Jan Beulich wrote:
   On 26.11.12 at 14:21, Ian Campbell i...@hellion.org.uk wrote:
   Debian has decided to take Jeremy's microcode patch [0] as an interim
   measure for their next release. (TL;DR -- Debian is shipping pvops Linux
   3.2 and Xen 4.1 in the next release. See http://bugs.debian.org/693053 
   and https://lists.debian.org/debian-devel/2012/11/msg00141.html for some
   more background).
   
   However the patch is a bit old and predates the use introduction of
   separate firmware files for AMD family = 15h. Looking at the SuSE
   forward ported classic Xen patches it seems like the following patch is
   all that is required. But it seems a little too simple to be true and I
   don't have any such processors to test on.
   
   Jan, can you recall if it really is that easy on the kernel side ;-)
  
  While so far I didn't myself run anything on post-Fam10 systems
  either, it really ought to be that easy - the patch format didn't
  change, it's just that they decided to spit the files by family to
  keep them manageable.
  
  The only other thing to check for is that you don't have any
  artificial size restriction left in that code (I think patch files early
  on were limited to 4k in size, and that got lifted during the last
  couple of years).
 
 I managed to find a machine and try this and it turns out that all that
 was missing from the kernel side was:
 
 @@ -58,7 +58,7 @@
  
  static enum ucode_state xen_request_microcode_fw(int cpu, struct 
 device *device)
  {
 -   char name[30];
 +   char name[36];
 struct cpuinfo_x86 *c = cpu_data(cpu);
 const struct firmware *firmware;
 struct ucode_cpu_info *uci = ucode_cpu_info + cpu;

Do you want to prep a patch that I can stick in my 'microcode' branch?
.. That I will at some point try to upstream.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20121205214741.ga1...@phenom.dumpdata.com



Re: [opensuse-kernel] Re: [RFC] Simplifying kernel configuration for distro issues

2012-07-19 Thread Konrad Rzeszutek Wilk
On Thu, Jul 19, 2012 at 07:53:10PM +0200, Borislav Petkov wrote:
 On Thu, Jul 19, 2012 at 10:06:44AM -0700, Linus Torvalds wrote:
  On Thu, Jul 19, 2012 at 9:48 AM, Borislav Petkov b...@amd64.org wrote:
  
   Seriously, this helps only in the cases where the stuff the distro
   actually needs is in modules. So, there probably are obscure situations
   where you need to enable stuff which is bool and not M.
  
  Sadly, not obscure at all.
  
  Most of the *drivers* are modules, but most of the distro config
  options are indeed booleans (or, if tristate, =y).
  
  Even driver-wise, there are some things that are often =y, even though
  you generally don't want them.
 
 Tell me about it. I'm always pissed off when someone thinks his stuff is
 very important and sets his sacred option to be =y/=m by default so the
 wider audience can at least compile-test it while the majority of the
 machines don't actually need it.
 
 A more coarse-grained config where most of the stuff is off by default
 could take care of that probably.
 
  PCMCIA? Not even *laptops* have that shit any more, but having
  built-in cardbus support almost certainly helps in a distro kernel for
  booting of certain odder cases.
 
 Yeah, distros need the one-size-fits-all thing so they have to enable
 *everything*.
 
  Xen support? Odd partition tables? All the different AGP versions?
  Many of us couldn't care less, but again, it makes sense in the actual
  distro kernel, even if it does *not* necessarily make sense in a
  personalized one.
 
 Yep.

I proposed something that would solve some of this - but not during
compile time but rather during boot-time
[http://lists.linux-foundation.org/pipermail/ksummit-2012-discuss/2012-June/99.html]
(interestingly enough hpa was first to propose it 10 years ago :-)

The goal is turn built-in components in well, unloadable components.
That way you won't have at least that much stuff laying around not being
used. Not the full silver bullet, but at least it gets some of this
stuff out of the way and you don't have to worry about the extra
stuff that was built-in.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120719184249.ga6...@phenom.dumpdata.com



Bug#676360: [Xen-devel] [PATCH] thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE\

2012-06-11 Thread Konrad Rzeszutek Wilk
  Nice. Andrew, any chane you could test this patch on the affected
  Xen hypervisors? Was it as easy to reproduce this on a RHEL5 (U1?)
  hypervisor or is it really only on Linode and Amazon EC2?
  
 
 Originally, I was able to reproduce the issue easily with a RHEL5
 host. Now, with this patch it's fixed.

OK, so Tested-by: Andrew Jones..
and from my perspective it looks good - so Acked-by: Konrad Rzeszutek Wilk 
konrad.w...@oracle.com

Andrea, any chance you can respin this patch and send it to Linus for 3.5 
please?



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120611192738.gm14...@phenom.dumpdata.com



Bug#676360: [PATCH] thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE\

2012-06-09 Thread Konrad Rzeszutek Wilk
On Thu, Jun 07, 2012 at 11:00:33PM +0200, Andrea Arcangeli wrote:
 In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while holding
 the mmap_sem for reading, cmpxchg8b cannot be used to read pmd
 contents under Xen.
 
 So instead of dealing only with consistent pmdvals in
 pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually
 simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with pmdvals
 where the low 32bit and high 32bit could be inconsistent (to avoid
 having to use cmpxchg8b).

nods
 
 The only guarantee we get from pmd_read_atomic is that if the low part
 of the pmd was found null, the high part will be null too (so the pmd
 will be considered unstable). And if the low part of the pmd is found
 stable later, then it means the whole pmd was read atomically
 (because after a pmd is stable, neither MADV_DONTNEED nor page faults
 can alter it anymore, and we read the high part after the low part).
 
 In the 32bit PAE x86 case, it is enough to read the low part of the
 pmdval atomically to declare the pmd as stable and that's true for
 THP and no THP, furthermore in the THP case we also have a barrier()
 that will prevent any inconsistent pmdvals to be cached by a later
 re-read of the *pmd.

Nice. Andrew, any chane you could test this patch on the affected
Xen hypervisors? Was it as easy to reproduce this on a RHEL5 (U1?)
hypervisor or is it really only on Linode and Amazon EC2?

 
 Signed-off-by: Andrea Arcangeli aarca...@redhat.com
 ---
  arch/x86/include/asm/pgtable-3level.h |   30 +-
  include/asm-generic/pgtable.h |   10 ++
  2 files changed, 27 insertions(+), 13 deletions(-)
 
 diff --git a/arch/x86/include/asm/pgtable-3level.h 
 b/arch/x86/include/asm/pgtable-3level.h
 index 43876f1..cb00ccc 100644
 --- a/arch/x86/include/asm/pgtable-3level.h
 +++ b/arch/x86/include/asm/pgtable-3level.h
 @@ -47,16 +47,26 @@ static inline void native_set_pte(pte_t *ptep, pte_t pte)
   * they can run pmd_offset_map_lock or pmd_trans_huge or other pmd
   * operations.
   *
 - * Without THP if the mmap_sem is hold for reading, the
 - * pmd can only transition from null to not null while pmd_read_atomic runs.
 - * So there's no need of literally reading it atomically.
 + * Without THP if the mmap_sem is hold for reading, the pmd can only
 + * transition from null to not null while pmd_read_atomic runs. So
 + * we can always return atomic pmd values with this function.
   *
   * With THP if the mmap_sem is hold for reading, the pmd can become
 - * THP or null or point to a pte (and in turn become stable) at any
 - * time under pmd_read_atomic, so it's mandatory to read it atomically
 - * with cmpxchg8b.
 + * trans_huge or none or point to a pte (and in turn become stable)
 + * at any time under pmd_read_atomic. We could read it really
 + * atomically here with a atomic64_read for the THP enabled case (and
 + * it would be a whole lot simpler), but to avoid using cmpxchg8b we
 + * only return an atomic pmdval if the low part of the pmdval is later
 + * found stable (i.e. pointing to a pte). And we're returning a none
 + * pmdval if the low part of the pmd is none. In some cases the high
 + * and low part of the pmdval returned may not be consistent if THP is
 + * enabled (the low part may point to previously mapped hugepage,
 + * while the high part may point to a more recently mapped hugepage),
 + * but pmd_none_or_trans_huge_or_clear_bad() only needs the low part
 + * of the pmd to be read atomically to decide if the pmd is unstable
 + * or not, with the only exception of when the low part of the pmd is
 + * zero in which case we return a none pmd.
   */
 -#ifndef CONFIG_TRANSPARENT_HUGEPAGE
  static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
  {
   pmdval_t ret;
 @@ -74,12 +84,6 @@ static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
  
   return (pmd_t) { ret };
  }
 -#else /* CONFIG_TRANSPARENT_HUGEPAGE */
 -static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
 -{
 - return (pmd_t) { atomic64_read((atomic64_t *)pmdp) };
 -}
 -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
  
  static inline void native_set_pte_atomic(pte_t *ptep, pte_t pte)
  {
 diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
 index ae39c4b..0ff87ec 100644
 --- a/include/asm-generic/pgtable.h
 +++ b/include/asm-generic/pgtable.h
 @@ -484,6 +484,16 @@ static inline int 
 pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd)
   /*
* The barrier will stabilize the pmdval in a register or on
* the stack so that it will stop changing under the code.
 +  *
 +  * When CONFIG_TRANSPARENT_HUGEPAGE=y on x86 32bit PAE,
 +  * pmd_read_atomic is allowed to return a not atomic pmdval
 +  * (for example pointing to an hugepage that has never been
 +  * mapped in the pmd). The below checks will only care about
 +  * the low part of the pmd with 32bit PAE x86 anyway, with the
 +  * exception of pmd_none(). So the important thing 

Bug#676360: [Xen-devel] xen: oops at atomic64_read_cx8+0x4

2012-06-07 Thread Konrad Rzeszutek Wilk
On Thu, Jun 07, 2012 at 12:33:55PM +0200, Andrea Arcangeli wrote:
 On Thu, Jun 07, 2012 at 02:33:33AM -0500, Jonathan Nieder wrote:
  Sergio Gelato wrote[1]:
  
That 3.4.1-1~experimental.1 build
   (3.4-trunk-686-pae #1 SMP Wed Jun 6 15:11:31 UTC 2012 i686 GNU/Linux)
   is even less well-behaved under Xen: I'm getting a kernel OOPS at
   EIP: [c1168e54] atomic64_read_cx8+0x4/0xc SS:ESP e021:ca853c6c
   The top of the trace message unfortunately scrolled off the console 
   before I
   could see it, and the message doesn't have time to make it to syslog 
   (either
   local or remote).
  [...]
   Non-Xen boots proceed normally.
  
  Yeah, apparently[2] that's caused by
  
  commit 26c191788f18
  Author: Andrea Arcangeli aarca...@redhat.com
  Date:   Tue May 29 15:06:49 2012 -0700
  
  mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP 
  race condition
  
  which was also included in Debian kernel 3.2.19-1.
  
  [1] http://bugs.debian.org/676360
  [2] https://bugzilla.redhat.com/show_bug.cgi?id=829016#c4
 
 Oops, sorry I didn't imagine atomic64_read on a pmd would trip.

Hmm, so it looks like it used to do this:

 pmd = pmd_offset(pud, addr);
 ..
 pmd_t pmdval = *pmd;

but now you do:
 pmd_t ret = (pmd_val)((u32)*tmp);
 ret |= (*tmp+1)  32;

which would read the low first and then the high one next
(or is the other way around?).  The 'pmd_offset' beforehand
manufactures the pmd using the PFN to MFN lookup tree (so
that there aren't any hypercall or traps).

Hm, with your change, you are still looking at the 'pmd'
and its contents, except that you are reading the low and
then the high part. Why that would trip the hypervisor
is not clear to me. Perhaps in the past it only read the
low bits?

If there was Xen hypervisor log that might give some ideas. Is
there any chance that the Linode folks could send that over?

 
 Unfortunately to support pagetable walking with mmap_sem hold for
 reading, we need an atomic read on 32bit PAE if
 CONFIG_TRANSPARENT_HUGEPAGE=y.
 
 The only case requiring this is 32bit PAE with
 CONFIG_TRANSPARENT_HUGEPAGE=y at build time. If you set
 CONFIG_TRANSPARENT_HUGEPAGE=n temporarily you should be able to work
 around this as I optimized the code in a way to avoid an expensive
 cmpxchg8b.

Ah, by just skipping the thing if the low bits are zero.
 
 I guess if Xen can't be updated to handle an atomic64_read on a pmd in
 the guest, we can add a pmd_read paravirt op? Or if we don't want to
 break the paravirt interface a loop like gup_fast with irq disabled
 should also work but looping + local_irq_disable()/enable() sounded
 worse and more complex than a atomic64_read (gup fast already disables
 irqs because it doesn't hold the mmap_sem so it's a different cost

I am not really sure what is at foot. It sounds like the hypervisor
didn't like somebody reading the high and low bit, but isn't the
pmdval_t still 64-bit ? So I would have thought this would
have been triggered? Or is that the code on pmd_val never actually
read the high bits (before your addition to the atomic_read?)?

 looping there). AFIK Xen disables THP during boot, so a check on THP
 being enabled and falling back in the THP=n version of
 pmd_read_atomic, would also be safe, but it's not so nice to do it
 with a runtime check.

The thing is that I did install a 32-bit PAE guest (a Fedora) on a Fedora
17 dom0. So it looks like this is reading high part is fixed on the newer
hypervisors, but now with the older ones. And the older one is Amazon EC2
so some .. hack to workaround older hypervisors could be added.




-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120607155647.go9...@phenom.dumpdata.com



Re: [Fwd: Xen performance kernel patches backported to 3.2 for Precise]

2011-12-16 Thread Konrad Rzeszutek Wilk
On Fri, Dec 16, 2011 at 07:06:59AM -0700, Tim Gardner wrote:
 On 12/16/2011 01:27 AM, Ian Campbell wrote:
 On Thu, 2011-12-15 at 20:39 -0700, Tim Gardner wrote:
 On 12/14/2011 11:18 PM, Ben Hutchings wrote:
 Xen performance kernel patches backported to 3.2 for Precise
 
 I'm still kind of waiting on these. Stefan Bader had some concerns and
 encouraged Citrix to work on getting them upstream, but I haven't seen a
 lot of movement on them.
 
 Konrad (CCd) is working on upstreaming these patches. He posted to LKML
 see Exporting ACPI Pxx/Cxx states to other kernel
 subsystems (1322673664-14642-1-git-send-email-konrad.w...@oracle.com)
 on 30/11.
 
 The only comment I saw on that posting was from Jan. I saw a comment
 from Stefan to ask why it hadn't been sent upstream which is what
 reminded Konrad to send the post to LKML, were there others?
 
 Ian.
 
 I've seen no substantive comment since Konrad posted the patches,
 which is why I noted above that I haven't seen a lot of movement on
 them. Maybe you should poke some upstream dudes, stir the pot a
 little.

laughsWitches brew, eh?

I think a big part of this is that Rafael, Len, and Matthew are all
focused on the ACPI v5.0 drop. Which is suppose to happen _now_ so they are
busy trying to make sure it works without disaster.

Either way, I've some ideas on stirring the pot some more so let
send an email out.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20111216143319.gc31...@phenom.dumpdata.com



Bug#642154: [Xen-devel] Re: BUG: unable to handle kernel paging request at ffff8803bb6ad000

2011-10-10 Thread Konrad Rzeszutek Wilk
On Sat, Oct 08, 2011 at 10:13:14AM +0400, rush wrote:
 OK, I tried it again, but Oops didn't gone.
.. snip..
 echo'Loading Xen 4.0-amd64 ...'
 multiboot   /boot/xen-4.0-amd64.gz placeholder xsave=0
.. snip..
 Was it right?

Yup. I think.. this is a bit embarrassing. It took a bit of time for Intel
folks to get the xsave part right and I remember seeing this error about a
year ago with xsave on a Dell Optiplex 780. Hence I wonder if the fixes that
ultimately went in 4.1.1 did not get ported over to 4.0 and you are just
hitting that.

Can I ask you to do one more thing? Can you upgrade to the xen-4.1.1 in
the testing and try with the xsave (or without) and see if it works?

holds his fingers hoping it is the xsave feature



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20111010164920.ga30...@phenom.oracle.com



Bug#642154: BUG: unable to handle kernel paging request at ffff8803bb6ad000

2011-10-03 Thread Konrad Rzeszutek Wilk
 echo'Loading Xen 4.0-amd64 ...'
 multiboot   /boot/xen-4.0-amd64.gz placeholder

Oops. I meant to try it in the hypervisor - so right after placeholder add 
xsave=0

 echo'Loading Linux 3.0.0-1-amd64 ...'
 module  /boot/vmlinuz-3.0.0-1-amd64 placeholder
 root=/dev/mapper/xen-system ro xsave=0 quiet





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20111003184722.gb15...@phenom.oracle.com



Bug#637234: [Xen-devel] Re: Bug#637234: linux-image-3.0.0-1-686-pae: I/O errors using ext4 under xen

2011-09-07 Thread Konrad Rzeszutek Wilk
On Wed, Sep 07, 2011 at 02:51:04AM +0100, Ben Hutchings wrote:
 On Mon, 2011-08-29 at 10:08 -0400, Konrad Rzeszutek Wilk wrote:
 [...]
  Oh, I think I know _exactly_ what bug that is:
  
  This git commit:
  280802657fb95c52bb5a35d43fea60351883b2af xen/blkback: When writting 
  barriers set the sector number to zero
  has to be reverted. Specifically:
  
  commit 3f963cae3ef35d26fdd899c08797a598c5ca3e9b
  Author: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
  Date:   Tue Jul 19 16:44:42 2011 -0700
  
  Revert xen/blkback: When writting barriers set the sector number to 
  zero...
 [...]
  and this one added:
  
  25266338a41470a21e9b3974445be09e0640dda7
  xen/blkback: don't fail empty barrier requests
 [...]
 
 Which repository are these in?

Jeremy's: git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git
 
 Ben.
 
 





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110907122938.ga32...@dumpdata.com



Bug#637308: xen-linux-system-2.6.32-5-xen-amd64: with kernel option 'nosmp', dom0 hangup while init PCI-Express Fusion-MPT SAS

2011-09-07 Thread Konrad Rzeszutek Wilk
 Looking at this again: this problem only really applies to dom0, and the
 new code won't even build in a domU-only kernel config with
 CONFIG_X86_IO_APIC unset.  I think we actually need something like:

Ok, that is Ok I think? We don't care about domU for this?

Or is it that it will cause bootup issues _with_ domU's that are built
as UP? That is not the case - as the smp.c won't be even built

I am not sure what the concern here is...



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110907170105.gh32...@dumpdata.com



Bug#637308: xen-linux-system-2.6.32-5-xen-amd64: with kernel option 'nosmp', dom0 hangup while init PCI-Express Fusion-MPT SAS

2011-09-07 Thread Konrad Rzeszutek Wilk
 As I understand it, the kernel won't work in dom0 if the (PV) IOAPIC is
 disabled.  CONFIG_XEN_DOM0 depends on CONFIG_X86_IO_APIC and we're now
 trying to catch the case where IOAPIC support is disabled at boot.
 
 However, in domU, IOAPIC support is not required (right?).  CONFIG_XEN

Yup.
 does not depend on CONFIG_X86_IO_APIC, so the following configuration
 is possible:
 
 CONFIG_SMP=y
 CONFIG_XEN=y
 # CONFIG_XEN_DOM0 is not set
 # CONFIG_X86_IO_APIC is not set
 
 And with this configuration the test for disabled IOAPIC support would
 fail to compile.

I see what you mean... except I can't get make to do this. Can you send me
the .config where you get the failure please?
Will prep a patch for this, which is just going to guard the usage
of 'ioapic_setup' with '#ifdef CONFIG_X86_IO_APIC'




-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110907194503.ga18...@dumpdata.com



Bug#637308: xen-linux-system-2.6.32-5-xen-amd64: with kernel option 'nosmp', dom0 hangup while init PCI-Express Fusion-MPT SAS

2011-09-01 Thread Konrad Rzeszutek Wilk
On Wed, Aug 31, 2011 at 09:01:40AM +0100, Ian Campbell wrote:
 On Tue, 2011-08-30 at 10:22 -0400, Konrad Rzeszutek Wilk wrote:
 
  It might make sense to also use 'xen_raw_printk' as sometimes you don't
  get to see the panic  - you end up with this unhelpfull message:
  
  (XEN) domain_crash_sync called from entry.S
  (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
  .. snip..
  
  so something like this:
 
 Fine by me, although I do wonder if maybe we shouldn't be fixing panic()
 itself or our console driver or something, this isn't the first such
 patch I've noticed which doubles up on the panic message. Is the
 underlying issue just that earlyprintk isn't on by default?

Yup. earlyprintk=xen would do the same thing.

Added this patch on the 3.1-rcX train with your Acked-by.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110901135334.gc23...@dumpdata.com



Bug#637308: xen-linux-system-2.6.32-5-xen-amd64: with kernel option 'nosmp', dom0 hangup while init PCI-Express Fusion-MPT SAS

2011-08-30 Thread Konrad Rzeszutek Wilk
On Tue, Aug 30, 2011 at 09:04:30AM +0100, Ian Campbell wrote:
 On Mon, 2011-08-29 at 12:55 +0100, Ben Hutchings wrote:
  On Mon, 2011-08-29 at 10:07 +0400, Константин Алексеев wrote:
   I think this bug may be closed.
   I posted it to xen devel list and get answer:
   It's really an unsupported configuration. If you want to limit dom0 vcpus
   then dom0_max_vcpus= on Xen command line is the correct way.
   
   http://lists.xensource.com/archives/html/xen-devel/2011-08/msg00665.html
  
  Maybe we should panic in this case?
 
 It's a bit sad but yes I think that would be better than leaving traps
 for the unwary given that the issue is unlikely to bubble up most Xen
 developers' todo list any time soon
 
 Your use of skip_ioapic_setup clued me into the probable difference
 between nosmp and dom0_max_vcpus=1 -- the disabling of IOAPIC most
 likely matters to Xen. Konrad does that sound right? 

Yes.
 
Something like this (untested):
 
 Looks plausible to me.
 
  
  diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
  index e79dbb9..2671b96 100644
  --- a/arch/x86/xen/smp.c
  +++ b/arch/x86/xen/smp.c
  @@ -21,6 +21,7 @@
   #include asm/desc.h
   #include asm/pgtable.h
   #include asm/cpu.h
  +#include asm/io_apic.h
   
   #include xen/interface/xen.h
   #include xen/interface/vcpu.h
  @@ -207,6 +208,12 @@ static void __init xen_smp_prepare_cpus(unsigned int 
  max_cpus)
  unsigned cpu;
  unsigned int i;
   
  +   if (skip_ioapic_setup)
  +   panic((max_cpus == 0) ?
  + The nosmp parameter is incompatible with Xen; 
  + use Xen dom0_max_vcpus=1 parameter :
  + The noapic parameter is incompatible with Xen);
  +

It might make sense to also use 'xen_raw_printk' as sometimes you don't
get to see the panic  - you end up with this unhelpfull message:

(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
.. snip..

so something like this:

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index b4533a8..8424dd4 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -32,6 +32,7 @@
 #include xen/page.h
 #include xen/events.h
 
+#include xen/hvc-console.h
 #include xen-ops.h
 #include mmu.h
 
@@ -207,6 +208,15 @@ static void __init xen_smp_prepare_cpus(unsigned int 
max_cpus)
unsigned cpu;
unsigned int i;
 
+   if (skip_ioapic_setup) {
+   char *m = (max_cpus == 0) ?
+   The nosmp parameter is incompatible with Xen;  \
+   use Xen dom0_max_vcpus=1 parameter :
+   The noapic parameter is incompatible with Xen;
+
+   xen_raw_printk(m);
+   panic(m);
+   }
xen_init_lock_cpu(0);
 
smp_store_cpu_info(0);



--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110830142232.gh11...@dumpdata.com



Bug#637234: [Xen-devel] Re: Bug#637234: linux-image-3.0.0-1-686-pae: I/O errors using ext4 under xen

2011-08-29 Thread Konrad Rzeszutek Wilk
On Fri, Aug 26, 2011 at 06:58:34PM -0400, Gedalya wrote:
 
 One way to make sure that is not the case is to disable barriers in the
 guest. Meaning in /etc/fstab have something like this:
 
 /dev/xvdc /blah ext4errors=remount-ro,barrier=0 0 1
 
 That seems to fix it. It was remounting as read only either during
 the boot process or immediately after, and now it boots up and seems
 to stay up. I'll test laster with a DomU that actually has things
 running.

Yeeey!
 
 This also fixes the reboot problem I noted earlier, init 6 now
 reboots the DomU rather than destory it.
 
 
 The other question is what version of Dom0 are you running? Is it 2.6.32?
 2.6.39?
 squeeze, running linux-image-2.6.32-5-xen-amd64  2.6.32-35

Oh, I think I know _exactly_ what bug that is:

This git commit:
280802657fb95c52bb5a35d43fea60351883b2af xen/blkback: When writting barriers 
set the sector number to zero
has to be reverted. Specifically:

commit 3f963cae3ef35d26fdd899c08797a598c5ca3e9b
Author: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
Date:   Tue Jul 19 16:44:42 2011 -0700

Revert xen/blkback: When writting barriers set the sector number to 
zero...

This reverts commit 280802657fb95c52bb5a35d43fea60351883b2af.  This patch
is reported to cause disk corruption:

From: Huang2, Wei wei.hua...@amd.com

We recently found a disk corruption issue with SLES11 SP1 guest. Basically
the guest disk becomes non-bootable after guest shutdown. This is a SLES
specific issue as we didn’t see on other Linux and Windows VMs. Here
is the configuration:



1.  Xen: xen-4.1-testing, changeset 23096

2.  Dom0: Jeremy’s latest pvops 6d94b75 (June 1)

3.  VM: SLES 11 SP1, installed as physical machine with raw disk format



Regarding the disk before corruption, “file sles11sp1.img” command
read: “/root/guests/sles11-sp1/sles11sp1.img: x86 boot sector;
partition 1: ID=0x82, starthead 1, startsector 63, 4208967 sectors;
partition 2: ID=0x83, active, starthead 0, startsector 4209030,
16755795 sectors”. After corruption, it became a data file:
““/root/guests/sles11-sp1/sles11sp1.img: data”.


and this one added:

25266338a41470a21e9b3974445be09e0640dda7
xen/blkback: don't fail empty barrier requests

The sector number on empty barrier requests may (will?) be -1, which,
given that it's being treated as unsigned 64-bit quantity, will almost
always exceed the actual (virtual) disk's size.

Inspired by Konrad's When writting barriers set the sector number to
zero




--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110829140849.ga3...@dumpdata.com



Bug#637234: [Xen-devel] Re: Bug#637234: linux-image-3.0.0-1-686-pae: I/O errors using ext4 under xen

2011-08-26 Thread Konrad Rzeszutek Wilk
On Thu, Aug 25, 2011 at 07:47:08AM +0100, Ian Campbell wrote:
 Hi Konrad,
 
 Does this look at all familiar? There is some more info in the full bug
 log at http://bugs.debian.org/637234 . In particular, contrary to the
 message below, the user subsequently confirmed that the issue appears to
 be Xen specific (doesn't happen on native or vmware) and that it arose
 between 2.6.39-2-686-pae and 3.0.0-1-686-pae.
 
 Could it be related to edf6ef59ec7e xen-blkfront: Introduce
 BLKIF_OP_FLUSH_DISKCACHE support? That looks like the only pertinent
 change between 2.6.39 and 3.0.


It shouldn't - from the look of it:
[0.529412] blkfront: xvdc: barrier: enabled

it looks as if the 'feature-barrier' is used. Not 'feature-flush-cache' -
otherwise you would have seen a message about that.

But then.. 3.0 (and 2.6.39) don't do barriers anymore. However the backend
seems to do it. And from my understanding is that the barrier request
is a superset of a flush request so it should work. Put maybe that is
an incorrect assumption.

One way to make sure that is not the case is to disable barriers in the
guest. Meaning in /etc/fstab have something like this:

/dev/xvdc /blah ext4errors=remount-ro,barrier=0 0 1


The other question is what version of Dom0 are you running? Is it 2.6.32?
2.6.39?



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110826175317.ga5...@dumpdata.com



Bug#638172: [Xen-devel] Re: Bug#638172: BUG: soft lockup - CPU#0 stuck for 61s! [qemu-dm:3205]

2011-08-24 Thread Konrad Rzeszutek Wilk
On Mon, Aug 22, 2011 at 10:00:11AM +0100, Ian Campbell wrote:
 @xen-devel:
 
 Does this look familiar to anyone, this is (I expect, hopefully Giuseppe
 will confirm) from Debian Squeeze which has a Xen 4.0.x with a PVops
 dom0 kernel based on xen.git from last summer (e73f4955a821) with more
 recent upstream longterm kernels (up to and including 2.6.32.41) merged
 in. While it does seem to have the switch from level to edge triggered
 interrupt the Debian kernel doesn't appear to have the switch to fasteoi
 for pirqs (0672fb44a111 plus a few followups) -- could that be related
 to this? (I'm not sure if that was a cleanup or a fix)

It was a fix. We had some interrupts getting wedged - but I don't recall
the stack exactly. But there are some follows - like
e5ac0bda96c495321dbad9b57a4b1a93a5a72e7f
7e186bdd0098b34c69fb8067c67340ae610ea499

 
 Might the tsc unstable message be relevant?

Hm, not sure. I keep on getting those on my guests but life seems to go on.


The interesting about the stack trace is that it looks similiar to:

http://groups.google.com/group/linux.kernel/browse_thread/thread/39a397566cafc979

which has some fixes https://patchwork.kernel.org/patch/1091772/
but they may not help.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110824202400.ga27...@dumpdata.com



Bug#604096: Bug#601341: Bug#602418: #601341, #602418 and #604096 seem to be duplicates

2010-12-22 Thread Konrad Rzeszutek Wilk
 Thanks. S since the Debian kernel has has DRM/TTM from 2.6.33 I assume I
 want the NEEDS_IOREMAP (95518271) version.
 
 I'm about to try my backport of devel/ttm.pci-api-v2 which contains:
 drm/ttm: Add ttm_tt_free_page
 ttm: Introduce a placeholder for DMA (bus) addresses.
 ttm: Utilize the dma_addr_t array for pages that are to in DMA32 pool.
 ttm: Expand (*populate) to support an array of DMA addresses.
 radeon/ttm/PCIe: Use dma_addr if TTM has set it.
 nouveau/ttm/PCIe: Use dma_addr if TTM has set it.
 radeon/PCIe: Use the correct index field.
 plus:
 9551827190db ttm: Set VM_IO only on pages with 
 TTM_MEMTYPE_FLAG_NEEDS_IOREMAP set.
 c54d5aa10b7a ttm: Change VMA flags if they != to the TTM flags.
 c07fbfd17e61 fbmem: VM_IO set, but not propagated

Looks good.

 d541daf6b956 pvops: make pte_flags() go via pvops

I've only hit that on a machine with a P4 Prescott with AGP. On nothing else - 
so
it might not be required... If you don't have it you just get a bunch of WARN.

 
 In addition the Debian kernel already contains 
 25021c9 x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO 
 vmas
 2eb6682 drm: recompute vma-vm_page_prot after changing vm_flags
 
 pvops: make pte_flags() go via pvops was the only bit of the patches
 which were omitted from the Debian kernel (the revert of bcf16b6b4f34)
 which didn't already appear to have been replaced by the other patches
 (ignoring all the AGP stuff) so I figured I may as well give it a go.
 
 My previous attempt (with all of the above except but make pte_flags()
 go via pvops) failed because I botched the backport of radeon/PCIe:
 Use the correct index field. and only fixed one of the wrong indexes.
 FWIW I think that patch should be folded down into the original patch
 for upstreaming.

Yeah, good idea. And also actually put my SOB on them.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20101222183522.ga28...@dumpdata.com



Bug#604096: Bug#601341: Bug#602418: #601341, #602418 and #604096 seem to be duplicates

2010-12-21 Thread Konrad Rzeszutek Wilk
 FWIW I ran a patched kernel up on my home machine (radeon) and it didn't
 work. Without KMS the X server failed reasonably gracefuly (with some,
 presumably spurious, message about the keyboard driver) and with KMS it
 switched graphics mode and then hung on a black screen.
 
 I'll keep poking but I'm hampered a bit by my only suitable test
 machine actually being my home workstation.

That was not the machine with the AGP card, right?

Did you get these patches in too:

25021c9 x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas
2eb6682 drm: recompute vma-vm_page_prot after changing vm_flags
dbbc947 ttm: Set VM_IO only on pages with TTM_MEMTYPE_FLAG_FIXED set.

 
 I suppose I should poke through 2.6.33..2.6.37-rc and see if anything
 jumps out for backporting.
 
   Did the series make any waves upstream? What are the chances that it
   will go upstream in something roughly like its current form?
  
  I hope so. I am putting the polishing touches on item c) to have it ready 
  for upstream.
 
 Cool.
 
 Hrm, do I need some equivalent of c) in order to have a chance of this
 stuff working?

Yes, and those three I mentioned earlier should suffice as a temporary solution.

Or you can go straight ahead and look at devel/p2m-identity (however, there is 
a bug
in them - ballooning in huge amounts of memory does not work right).



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20101221162544.gc3...@dumpdata.com



Bug#604096: Bug#601341: Bug#602418: #601341, #602418 and #604096 seem to be duplicates

2010-12-20 Thread Konrad Rzeszutek Wilk
 Then I got to eba164ec7e69 radeon/nouveau/ttm/AGP: Use dma_addr if TTM
 has set it. which complained:
   CC [M]  drivers/gpu/drm/ttm/ttm_agp_backend.o
 drivers/gpu/drm/ttm/ttm_agp_backend.c: In function ‘ttm_agp_populate’:
 drivers/gpu/drm/ttm/ttm_agp_backend.c:66: error: ‘struct agp_memory’ 
 has no member named ‘dma_addr’
 and indeed the field is missing both in 2.6.32+drm33 and Linus' tree. Do
 I need to cherry pick something from another series or is this commit

You can drop that patch. I've rebased the tree to:

devel/ttm.pci-api-v2

which is exactly like the older except missing that patch.

 something which should be ignored per our previous discussion about PCIe
 vs AGP etc? (I'm going with the second option for now) 

Yup.
 
 I'll publish my backport in a git tree once I'm happy with it, I need to
 tidy it up and correct the cherry-picked from comments etc and then
 actually build something which uses it. I'll make Debian packages
 available for wider testing once I've done that (with Xmas coming up I
 don't know when that will actually be).
 
 Did the series make any waves upstream? What are the chances that it
 will go upstream in something roughly like its current form?

I hope so. I am putting the polishing touches on item c) to have it ready for 
upstream.



--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20101220164224.ga15...@dumpdata.com



Bug#604096: Bug#601341: Bug#602418: #601341, #602418 and #604096 seem to be duplicates

2010-12-07 Thread Konrad Rzeszutek Wilk
On Tue, Dec 07, 2010 at 11:49:14AM +, Ian Campbell wrote:
 On Mon, 2010-12-06 at 19:27 -0500, Konrad Rzeszutek Wilk wrote:
a) Fix the GART/AGP backend (so drivers/char/agp/*.c) so they use the 
   PCI API.
  Only the i915 and higher are using the PCI API and I've some of the 
   older
  boxes with i860 so can actually test it.
  
  I've posted patches to address this (https://lkml.org/lkml/2010/12/6/480)
  and Dave question is why anyone cares about AGP in 2010.
  
  I was wondering if any folks could comment?
 
 His general principle of fixing the modern stuff first and then working
 backwards until nobody is complaining any more seems pretty sane to me.

nods
 
 Is the series at https://lkml.org/lkml/2010/12/6/516  sufficient in its
 own right to make Nouveau and ATI work or is more needed? What about NV?

Both Nouveau and ATI (PCIe) look to work. I did light testing (ATI ES1000,
Radeon 3450, Nvidia 65.. something) and will need to do some more
aggressive ones. Oh, and Intel GTT seems to work without any of these
patches - but I've only tested it on a machine with 4GB so I need to add
more memory to make sure.

 
 More generally if we were to take the series from
 https://lkml.org/lkml/2010/12/6/516 but not the series from
 https://lkml.org/lkml/2010/12/6/480 which sets of cards would we be
 including/excluding support for?

 PCIe = supported
 PCI = not supported
 AGP = not supported.
 
 Ian.
 -- 
 Ian Campbell
 Current Noise: Mistress - 38
 
 What's done to children, they will do to society.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20101207164724.ga5...@dumpdata.com



Bug#604096: Bug#601341: Bug#602418: #601341, #602418 and #604096 seem to be duplicates

2010-12-07 Thread Konrad Rzeszutek Wilk
 Dave's concerns seemed mainly to be about AGP bits rather than PCI, are

Yeah, his concerns are valid: why touch it if nobody is using it. And
if truly there aren't enough folks being interested in AGP support, then
I am fine dropping it.

 they independent(-ish)? e.g. is only a subset of the .../480 series is

The PCI cards I am taking about are the .. PCI Matrox G400 or
like, so even older than AGP cards.

 needed to enable PCI support?

You know, I might have not actually posted a patch for this. It was one
of those DRM scattergather code. shrugs



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20101207171848.ga5...@dumpdata.com



Bug#604096: Bug#602418: #601341, #602418 and #604096 seem to be duplicates

2010-12-06 Thread Konrad Rzeszutek Wilk
  a) Fix the GART/AGP backend (so drivers/char/agp/*.c) so they use the PCI 
 API.
Only the i915 and higher are using the PCI API and I've some of the older
boxes with i860 so can actually test it.

I've posted patches to address this (https://lkml.org/lkml/2010/12/6/480)
and Dave question is why anyone cares about AGP in 2010.

I was wondering if any folks could comment?



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20101207002754.ga31...@dumpdata.com



Bug#604096: Bug#602418: #601341, #602418 and #604096 seem to be duplicates

2010-11-29 Thread Konrad Rzeszutek Wilk

.. snip of back-history..
  Thanks for the pointers.
  
  I agree with Bastian that some of these changes are really quite nasty.
  Do you and other Xen developers have any plan for how to fix the GART
  and TTM mapping problems in a cleaner way as Xen dom0 support goes
  upstream?
 
 I know that there is a plan to get rid of the _PAGE_IOMAP stuff
 altogether by simply arranging for a 1-1 mapping for the relevant device
 PFNs in the P2M array. However I'm not sure whether or not this
 knocks-on into a fix for the GART/TTM stuff.

Unfortunately it won't fit the whole bill. What some of those patches
did was introduce a mechanism to use the PCI API to do virt_to_phys.

And when I say use, I mean that really loosely. The solution I cobbled
was to bypass using any API and just hard-coded the phys-bus address
lookup. My plan for upstream is to actually work on those drivers
(intel-agp.c, agpgart.c) to utilize the PCI API.

 
 Konrad, do you have an idea how you plan to solve the GART/TTM issues
 upstream?

Yes, I am working on a set of patches that are cleaner and more upstream-able
than the first revision. Hope to have most of a) and b) done in the next two 
weeks.

And there are actually three distinct milestones here:

 a) Fix the GART/AGP backend (so drivers/char/agp/*.c) so they use the PCI API.
   Only the i915 and higher are using the PCI API and I've some of the older
   boxes with i860 so can actually test it.
 b) Fix the TTM to use the DMA API.
 c) Lastly, get rid of _PAGE_IOMAP so we don't have to depend on 
radeon/nouveau/etc
   to set the proper _PAGE_IOMAP on the PFNs/BARs..

 
  If there is no such plan then I would rather disable these drivers than
  make them work temporarily with a hack.

The shape of the stuff that I am going to propose upstream is more refined
and much cleaner. Do you want me to send you an email when I am ready
and had done my testing so you can take a look at it?




-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20101129151808.ga19...@dumpdata.com



Bug#596419: Acknowledgement (xen-linux-system-2.6.32-5-xen-amd64: causes a system hangup by the shutdown of the system, aacraid (sw raid) involved in hangup)

2010-09-20 Thread Konrad Rzeszutek Wilk
 So, it worked if I have specified in Dom0 in the baloon mode by omitting
 the specification of dom0_mem or, if dom0_mem is specified then also the
 swiotlb=65536 must be specified.

Wow. That implies that AACRAID uses quite a lot of buffers, and looking at the 
driver
there are a bunch of quirks where it can only do DMA up to 2GB, so that would 
explain
why it relies on SWIOTLB that much.

Based on what Ian analyzed it really looks that we just ran out of DMA buffers 
and
the driver didn't try to retry but just bails out.

We can narrow down who is using so many buffers by using the attached debug 
module
that when loaded will print out who is using what buffers if
CONFIG_DMA_API_DEBUG=y is set.

But the proper workaround is the one you discovered - either raise the SWIOTLB 
buffer
or raise the memory allocated for Dom0.

 
 I have noticed one interesting behavior - during the successfull suspension
 of the domains during the shutdown the first one which is beeing suspended
 writes very fast three dots, then it stops to write the dots for some time
 and then agfter some time very fast a lot of (possibly also all remaining)
 dots are written on the screen. By the next suspensions the suspension
 works continuously dot-by-dot smoothly without any delays. It looks like it
 waits for something during the first suspension (memory allocation?).

That usually means that is stuck waiting for the disks to write out all the 
data.
 
 Generally, it is for me very surpsrising, how the aacraid module works, I am
 no C or kernel developer but I would expect something like this cannot
 happen - the module should allocate its necessary memory in the start or, I
 would understand there can fail some specific read or write operation if the
 sw raid has not enough memory to execute them, but I would never expect this
 will lead to the hangup and freeze of the whole system. The probability of

Well, to be honest, we engineers aren't known for testing all of the failure 
paths
as well as we should. That is why folks like you are quite helpful in finding
bugs :-)
/*
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License v2.0 as published by
 * the Free Software Foundation
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 */

#include linux/module.h
#include linux/string.h
#include linux/types.h
#include linux/init.h
#include linux/stat.h
#include linux/err.h
#include linux/ctype.h
#include linux/slab.h
#include linux/limits.h
#include linux/device.h
#include linux/pci.h
#include linux/blkdev.h
#include linux/device.h

#include linux/init.h
#include linux/mm.h
#include linux/fcntl.h
#include linux/slab.h
#include linux/kmod.h
#include linux/major.h
#include linux/smp_lock.h
#include linux/highmem.h
#include linux/blkdev.h
#include linux/module.h
#include linux/blkpg.h
#include linux/buffer_head.h
#include linux/mpage.h
#include linux/mount.h
#include linux/uio.h
#include linux/namei.h
#include asm/uaccess.h

#include linux/pagemap.h
#include linux/pagevec.h

#include linux/dma-debug.h

#define DUMP_DMA_FUN  0.1

MODULE_AUTHOR(Konrad Rzeszutek Wilk kon...@virtualiron);
MODULE_DESCRIPTION(dump dma);
MODULE_LICENSE(GPL);
MODULE_VERSION(DUMP_DMA_FUN);

static int __init dump_dma_init(void)
{
	debug_dma_dump_mappings(NULL);
	return 0;
}

static void __exit dump_dma_exit(void)
{
}

module_init(dump_dma_init);
module_exit(dump_dma_exit);
# Comment/uncomment the following line to disable/enable debugging
#DEBUG = y

# Add your debugging flag (or not) to CFLAGS
ifeq ($(DEBUG),y)
  DEBFLAGS = -O -g # -O is needed to expand inlines
else
  DEBFLAGS = -O2
endif

EXTRA_CFLAGS += $(DEBFLAGS) -I$(LDDINCDIR)

ifneq ($(KERNELRELEASE),)
# call from kernel build system

obj-m   := dump_dma.o

else

#KERNELDIR ?= /lib/modules/$(shell uname -r)/build
KERNELDIR ?= /home/konrad/git/neb.64/linux-build
PWD   := $(shell pwd)

default:
$(MAKE) -C $(KERNELDIR) M=$(PWD) LDDINCDIR=$(PWD)/../include modules

endif

clean:
rm -rf *.o *~ core .depend .*.cmd *.ko *.mod.c .tmp_versions

depend .depend dep:
$(CC) $(CFLAGS) -M *.c  .depend


ifeq (.depend,$(wildcard .depend))
include .depend
endif


Re: CONFIG_SYSFS_DEPRECATED_V2 is not set for linux-image-2.6.32-5-xen-amd64

2010-07-27 Thread Konrad Rzeszutek Wilk
On Tue, Jul 27, 2010 at 07:26:07AM +0100, Ian Campbell wrote:
 On Tue, 2010-07-27 at 00:47 +0100, Ben Hutchings wrote:
  On Tue, 2010-07-27 at 01:04 +0200, Bart Verwilst wrote:
   Hi
   
   I am not on the list, so please forgive me and put me in CC :)
   I'm trying to boot an Ubuntu Lucid from linux-image-2.6.32-5-xen-amd64
   through the Xen 4.0.1-rc3 hypervisor ( also Debian packages ), which
   boots fine, but then fails to show me the console:
  [...]
   While looking for a reason ( console and stuffs
   should all be fine ), i came across this:
   
   r...@database42:/boot# grep DEPRE config-2.6.32-5-xen-amd64 
   # CONFIG_SYSFS_DEPRECATED_V2 is not set
   CONFIG_ENABLE_WARN_DEPRECATED=y
   
   
   I knew the latest udev needs this to be deprecated, and the Xen docs
   told me the same:
   
   Make sure you have these two set (otherwise your init hangs and udev
   stops working) 
   
   CONFIG_SYSFS_DEPRECATED=y
   CONFIG_SYSFS_DEPRECATED_V2=y
   
   
   What am i missing here? Why does it work for the Debian systems ( i guess 
   :) )
  
  You are confused.  CONFIG_SYSFS_DEPRECATED(_V2)=y means that the
  deprecated entries still appear in sysfs, so the current configuration
  is correct.
  
  I think the deprecated entries used to be required by the administration
  tools that run in dom0, but AFAIK this is no longer be true.
 
 I thought the option was there to allow older udev (and/or initrd)
 running on newer kernels? IIRC the initrd hang referred to above was a

Yes. And also multipath and lsscsi.

 bug in RH's nash (used in the initrd) which was triggered by a dir in
 sysfs becoming a symlink (or vice-versa).
 
 I wasn't aware of any Xen specific requirement for those options, the
 suggestion on
 http://wiki.xensource.com/xenwiki/2.6.18-to-2.6.31-and-higher does seem
 overly broad since it only applied to userspace of a particular vintage
 from one distro family. Konrad, what do you think?

Debian does not seem to use nash. I am not sure which version (and if
they have the patches) for the multipath are affected. I can definitly
alter it to say: Hey, this is only for RHEL/CentOS users.
 
 Ian.
 -- 
 Ian Campbell
 
 94% of the women in America are beautiful and the rest hang out around here.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100727153146.gb4...@phenom.dumpdata.com