Re: 2.6.12 hangs on boot

2005-07-18 Thread Alexander Y. Fomichev
On Saturday 25 June 2005 02:20, Linus Torvalds wrote:
 On Wed, 22 Jun 2005, Alexander Y. Fomichev wrote:
  I've been trying to switch from 2.6.12-rc3 to 2.6.12 on Dual EM64T 2.8
  GHz [ MoBo: Intel E7520, intel 82801 ]
  but kernel hangs on boot right after records:
 
  Booting processor 2/1 rip 6000 rsp 8100023dbf58
  Initializing CPU#2

 Hmm.. Since you seem to be a git user, maybe you could try the git
 bisect thing to help narrow down exactly where this happened (and help
 test that thing too ;).

[skiped]

Ok, as i can see  [and as Andi guessed 
http://bugme.osdl.org/show_bug.cgi?id=4792] 
issue have been introduced by new TSC sync algorithm
git id: dda50e716dc9451f40eebfb2902c260e4f62cf34.

And, yes, seems like it depends of timings...
In my case kludge with insertion of low delay (e.g. printk) between 
cpu_set/mb and tsc_sync_wait() makes kernel bootable.

diff -urN b/arch/x86_64/kernel/smpboot.c a/arch/x86_64/kernel/smpboot.c
--- b/arch/x86_64/kernel/smpboot.c  2005-07-17 21:55:55.0 +0400
+++ a/arch/x86_64/kernel/smpboot.c  2005-07-17 21:57:56.0 +0400
@@ -451,6 +451,7 @@
cpu_set(smp_processor_id(), cpu_online_map);
mb();
 
+   printk(KERN_INFO We're still here!\n);
/* Wait for TSC sync to not schedule things before.
   We still process interrupts, which could see an inconsistent
   time in that window unfortunately. */ 

-- 
Best regards.
Alexander Y. Fomichev [EMAIL PROTECTED]
Public PGP key: http://sysadminday.org.ru/gluk.asc
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.12 hangs on boot

2005-07-07 Thread Alexander Y. Fomichev
On Saturday 25 June 2005 02:20, Linus Torvalds wrote:
 On Wed, 22 Jun 2005, Alexander Y. Fomichev wrote:
  I've been trying to switch from 2.6.12-rc3 to 2.6.12 on Dual EM64T 2.8
  GHz [ MoBo: Intel E7520, intel 82801 ]
  but kernel hangs on boot right after records:
 
  Booting processor 2/1 rip 6000 rsp 8100023dbf58
  Initializing CPU#2

 Hmm.. Since you seem to be a git user, maybe you could try the git
 bisect thing to help narrow down exactly where this happened (and help
 test that thing too ;).

 You can basically use git to find the half-way point between a set of
 known good points and a known bad point (bisecting the set of
 commits), and doing just a few of those should give us a much better view
 of where things started going wrong.

 For example, since you know that 2.6.12-rc3 is good, and 2.6.12 is bad,
 you'd do

   git-rev-list --bisect v2.6.12 ^v2.6.12-rc3

 where the v2.6.12 ^v2.6.12-rc3 thing basically means everything in
 v2.6.12 but _not_ in v2.6.12-rc3 (that's what the ^ marks), and the
 --bisect flag just asks git-rev-list to list the middle-most commit,
 rather than all the commits in between those kernel versions.

 You should get the answer 0e6ef3e02b6f07e37ba1c1abc059f8bee4e0847f, but
 before you go any further, just make sure your git index is all clean:

   git status

 should not print anything else than nothing to commit. If so, then
 you're ready to try the new mid-point head:

   git-rev-list --bisect v2.6.12 ^v2.6.12-rc3  .git/refs/heads/try1
   git checkout try1

 which will create a new branch called try1, where the head is that
 mid-point, and it will switch to that branch (this requires a fairly
 recent git, btw, so make sure you update your git first).

 Then, compile that kernel, and try it out.

 Now, there are two possibilities: either try1 ends up being good, or it
 still shows the bug. If it is a buggy kernel, then you now have a new
 bad  point, and you do

   git-rev-list --bisect try1 ^v2.6.12-rc3  .git/refs/heads/try2
   git checkout try2

 which is all the same thing as you did before, except now we use try1 as
 the known bad one rather than v2.6.12 (and we call the new branch try2
 of course).

 However, if that try1 is _good_, and doesn't show the bug, then you
 shouldn't replace the other known good case, but instead you should add
 it to the list of good commits (aka commits we don't want to know about):

   git-rev-list --bisect v2.6.12 ^v2.6.12-rc3 ^try1  .git/refs/heads/try2
   git checkout try2

 ie notice how we now say: want to get the bisection of the commits in
 v2.6.12 (known bad) but _not_ in either of v2.6.12-rc3 or the 'try1'
 branch (which are known good).

 After compiling and testing a few kernels, you will have narrowed the
 range down a _lot_, and at some point you can just say

   git-rev-list --pretty try4 ^v2.6.12-rc3 ^try1 ^try3

 (or however the success/failure pattern ends up being - the above
 example line assumes that try1 didn't have the bug, but try2 did, and
 then try3 was ok again but try4 was buggy), and you'll get a fairly
 small list of commits that are the potential bad ones.

 After the above four tries, you'd have limited it down to a list of 95
 changes (from the original 1520), so it would really be best to try six or
 seven different kernels, but at that point you'd have it down to less than
 20 commits and then pinpointing the bug is usually much easier.

 And when you're done, you can just do

   git checkout master

 and you're back to where you started.

   Linus
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

Thank you for your answer, i've been on vacations last two weeks,
and i didn't have an access to my mail account.
Hmmm... it seems that 'bisect' method not applicable to this host, this
is production server, not so critical to one or two reboots but 'bisect' will
require much more, i suspect. I've another host, nearly the same as of
hardware and non-critical where such tests could be done , but i haven't a 
serial console on it as now. It takes some time to link console because both 
of this are remote hosts.

-- 
Best regards.
Alexander Y. Fomichev [EMAIL PROTECTED]
Public PGP key: http://sysadminday.org.ru/gluk.asc
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html